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Prefaces 


Preface to the English Edition 


An entire generation of mathematicians has grown up during the time be- 
tween the appearance of the first edition of this textbook and the publication 
of the fourth edition, a translation of which is before you. The book is famil- 
iar to many people, who either attended the lectures on which it is based or 
studied out of it, and who now teach others in universities all over the world. 
I am glad that it has become accessible to English-speaking readers. 

This textbook consists of two parts. It is aimed primarily at university 
students and teachers specializing in mathematics and natural sciences, and 
at all those who wish to see both the rigorous mathematical theory and 
examples of its effective use in the solution of real problems of natural science. 

Note that Archimedes, Newton, Leibniz, Euler, Gauss, Poincaré, who are 
held in particularly high esteem by us, mathematicians, were more than mere 
mathematicians. ‘They were scientists, natural philosophers. In mathematics 
resolving of important specific questions and development of an abstract gen- 
eral theory are processes as inseparable as inhaling and exhaling. Upsetting 
this balance leads to problems that sometimes become significant both in 
mathematical education and in science in general. 

The textbook exposes classical analysis as it is today, as an integral part 
of the unified Mathematics, in its interrelations with other modern mathe- 
matical courses such as algebra, differential geometry, differential equations, 
complex and functional analysis. 

Rigor of discussion is combined with the development of the habit of 
working with real problems from natural sciences. The course exhibits the 
power of concepts and methods of modern mathematics in exploring spe- 
cific problems. Various examples and numerous carefully chosen problems, 
including applied ones, form a considerable part of the textbook. Most of the 
fundamental mathematical notions and results are introduced and discussed 
along with information, concerning their history, modern state and creators. 
In accordance with the orientation toward natural sciences, special attention 
is paid to informal exploration of the essence and roots of the basic concepts 
and theorems of calculus, and to the demonstration of numerous, sometimes 
fundamental, applications of the theory. 
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For instance, the reader will encounter here the Galilean and Lorentz 
transforms, the formula for rocket motion and the work of nuclear reac- 
tor, Euler’s theorem on homogeneous functions and the dimensional analysis 
of physical quantities, the Legendre transform and Hamiltonian equations 
of classical mechanics, elements of hydrodynamics and the Carnot’s theo- 
rem from thermodynamics, Maxwell’s equations, the Dirac delta-function, 
distributions and the fundamental solutions, convolution and mathematical 
models of linear devices, Fourier series and the formula for discrete coding 
of a continuous signal, the Fourier transform and the Heisenberg uncertainty 
principle, differential forms, de Rham cohomology and potential fields, the 
theory of extrema and the optimization of a specific technological process, 
numerical methods and processing the data of a biological experiment, the 
asymptotics of the important special functions, and many other subjects. 

Within each major topic the exposition is, as a rule, inductive, sometimes 
proceeding from the statement of a problem and suggestive heuristic consider- 
ations concerning its solution, toward fundamental concepts and formalisms. 
Detailed at first, the exposition becomes more and more compressed as the 
course progresses. Beginning ab ovo the book leads to the most up-to-date 
state of the subject. 

Note also that, at the end of each of the volumes, one can find the list 
of the main theoretical topics together with the corresponding simple, but 
nonstandard problems (taken from the midterm exams), which are intended 
to enable the reader both determine his or her degree of mastery of the 
material and to apply it creatively in concrete situations. 

More complete information on the book and some recommendations for 
its use in teaching can be found below in the prefaces to the first and second 
Russian editions. 


Moscow, 2003 V. Zorich 
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Preface to the Fourth Russian Edition 


The time elapsed since the publication of the third edition has been too short 
for me to receive very many new comments from readers. Nevertheless, some 
errors have been corrected and some local alterations of the text have been 
made in the fourth edition. 


Moscow, 2002 V. Zorich 


Preface to the Third Russian edition 


This first part of the book is being published after the more advanced Part 
2 of the course, which was issued earlier by the same publishing house. For 
the sake of consistency and continuity, the format of the text follows that 
adopted in Part 2. The figures have been redrawn. All the misprints that 
were noticed have been corrected, several exercises have been added, and the 
list of further readings has been enlarged. More complete information on the 
subject matter of the book and certain characteristics of the course as a whole 
are given below in the preface to the first edition. 


Moscow, 2001 V. Zorich 


Preface to the Second Russian Edition 


In this second edition of the book, along with an attempt to remove the mis- 
prints that occurred in the first edition, ! certain alterations in the exposition 
have been made (mainly in connection with the proofs of individual theo- 
rems), and some new problems have been added, of an informal nature as a 
rule. 

The preface to the first edition of this course of analysis (see below) con- 
tains a general description of the course. The basic principles and the aim 
of the exposition are also indicated there. Here I would like to make a few 
remarks of a practical nature connected with the use of this book in the 
classroom. 

Usually both the student and the teacher make use of a text, each for his 
own purposes. 

At the beginning, both of them want most of all a book that contains, 
along with the necessary theory, as wide a variety of substantial examples 


l No need to worry: in place of the misprints that were corrected in the plates 
of the first edition (which were not preserved), one may be sure that a host of 
new misprints will appear, which so enliven, as Euler believed, the reading of a 
mathematical text. 
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of its applications as possible, and, in addition, explanations, historical and 
scientific commentary, and descriptions of interconnections and perspectives 
for further development. But when preparing for an examination, the student 
mainly hopes to see the material that will be on the examination. The teacher 
likewise, when preparing a course, selects only the material that can and must 
be covered in the time alloted for the course. 

In this connection, it should be kept in mind that the text of the present 
book is noticeably more extensive than the lectures on which it is based. What 
caused this difference? First of all, the lectures have been supplemented by 
essentially an entire problem book, made up not so much of exercises as sub- 
stantive problems of science or mathematics proper having a connection with 
the corresponding parts of the theory and in some cases significantly extend- 
ing them. Second, the book naturally contains a much larger set of examples 
illustrating the theory in action than one can incorporate in lectures. Third 
and finally, a number of chapters, sections, or subsections were consciously 
written as a supplement to the traditional material. This is explained in the 
sections “On the introduction” and “On the supplementary material” in the 
preface to the first edition. 

I would also like to recall that in the preface to the first edition I tried to 
warn both the student and the beginning teacher against an excessively long 
study of the introductory formal chapters. Such a study would noticeably 
delay the analysis proper and cause a great shift in emphasis. 

To show what in fact can be retained of these formal introductory chap- 
ters in a realistic lecture course, and to explain in condensed form the syllabus 
for such a course as a whole while pointing out possible variants depending 
on the student audience, at the end of the book I give a list of problems 
from the midterm exam, along with some recent examination topics for the 
first two semesters, to which this first part of the book relates. From this list 
the professional will of course discern the order of exposition, the degree of 
development of the basic concepts and methods, and the occasional invoca- 
tion of material from the second part of the textbook when the topic under 
consideration is already accessible for the audience in a more general form.’ 

In conclusion I would like to thank colleagues and students, both known 
and unknown to me, for reviews and constructive remarks on the first edition 
of the course. It was particularly interesting for me to read the reviews of 
A. N. Kolmogorov and V.I. Arnol’d. Very different in size, form, and style, 
these two have, on the professional level, so many inspiring things in common. 


Moscow, 1997 V. Zorich 


2 Some of the transcripts of the corresponding lectures have been published and I 
give formal reference to the booklets published using them, although I understand 
that they are now available only with difficulty. (The lectures were given and 
published for limited circulation in the Mathematical College of the Independent 
University of Moscow and in the Department of Mechanics and Mathematics of 
Moscow State. University. ) 7 
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From the Preface to the First Russian Edition 


The creation of the foundations of the differential and integral calculus by 
Newton and Leibniz three centuries ago appears even by modern standards 
to be one of the greatest events in the history of science in general and 
mathematics in particular. 

Mathematical analysis (in the broad sense of the word) and algebra have 
intertwined to form the root system on which the ramified tree of modern 
mathematics is supported and through which it makes its vital contact with 
the nonmathematical sphere. It is for this reason that the foundations of 
analysis are included as a necessary element of even modest descriptions of 
so-called higher mathematics; and it is probably for that reason that so many 
books aimed at different groups of readers are devoted to the exposition of 
the fundamentals of analysis. | 

This book has been aimed primarily at mathematicians desiring (as is 
proper) to obtain thorough proofs of the fundamental theorems, but who are 
at the same time interested in the life of these theorems outside of mathe- 
matics itself. 

The characteristics of the present course connected with these circum- 
stances reduce basically to the following: 


In the exposition. Within each major topic the exposition is as a rule induc- 
tive, sometimes proceeding from the statement of a problem and suggestive 
heuristic considerations toward its solution to fundamental concepts and for- 
malisms. 

Detailed at first, the exposition becomes more and more compressed as 
the course progresses. | 

An emphasis is placed on the efficient machinery of smooth analysis. In 
the exposition of the theory I have tried (to the extent of my knowledge) to 
point out the most essential methods and facts and avoid the temptation of 
a minor strengthening of a theorem at the price of a major complication of 
its proof. 

The exposition is geometric throughout wherever this seemed worthwhile 
in order to reveal the essence of the matter. 

The main text is supplemented with a rather large collection of examples, 
and nearly every section ends with a set of problems that I hope will sig- 
nificantly complement even the theoretical part of the main text. Following 
the wonderful precedent of Pólya and Szegő, I have often tried to present 
a beautiful mathematical result or an important application as a.series of 
problems accessible to the reader. | 

_ The arrangement of the material was dictated not only by the architecture 
of mathematics in the sense of Bourbaki, but also by the position of analysis 
as a component of a unified mathematical or, one should rather say, natural- 
science/mathematical education. 


In content. This course is being published in two books (Part 1 and Part 2). 
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The present Part 1 contains the differential and integral calculus of func- 
tions of one variable and the differential calculus of functions of several vari- 
ables. 

In differential calculus we emphasize the role of the differential as a linear 
standard for describing the local behavior of the variation of a variable. In ad- 
dition to numerous examples of the use of differential calculus to study func- 
tional relations (monotonicity, extrema) we exhibit the role of the language 
of analysis in writing simple differential equations - mathematical models of 
real-world phenomena and the substantive problems connected with them. 

We study a number of such problems (for example, the motion of a body of 
variable mass, a nuclear reactor, atmospheric pressure, motion in a resisting 
medium) whose solution leads to important elementary functions. Full use is 
made of the language of complex variables; in particular, Euler’s formula is 
derived and the unity of the fundamental elementary functions is shown. 

The integral calculus has consciously been explained as far as possible 
using intuitive material in the framework of the Riemann integral. For the 
majority of applications, this is completely adequate.* Various applications 
of the integral are pointed out, including those that lead to an improper in- 
tegral (for example, the work involved in escaping from a gravitational field, 
and the escape velocity for the Earth’s gravitational field) or to elliptic func- 
tions (motion in a gravitational field in the presence of constraints, pendulum 
motion.) 

The differential calculus of functions of several variables is very geometric. 
In this topic, for example, one studies such important and useful consequences 
of the implicit function theorem as curvilinear coordinates and local reduction 
to canonical form for smooth mappings (the rank theorem) and functions 
(Morse’s lemma), and also the theory of extrema with constraint. 

Results from the theory of continuous functions and differential calculus 
are summarized and explained in a general invariant form in two chapters 
that link up naturally with the differential calculus of real-valued functions 
of several variables. These two chapters open the second part of the course. 
The second book, in which we also discuss the integral calculus of functions 
of several variables up to the general Newton—Leibniz—Stokes formula thus 
acquires a certain unity. 

We shall give more complete information on the second book in its preface. 
At this point we add only that, in addition to the material already mentioned, 
it contains information on series of functions (power series and Fourier series 
included), on integrals depending on a parameter (including the fundamental 
solution, convolution, and the Fourier transform), and also on asymptotic 
expansions (which are usually absent or insufficiently presented in textbooks). 

We now discuss a few particular problems. 


3 The “stronger” integrals, as is well known, require fussier set-theoretic consider- 
ations, outside the mainstream of the textbook, while adding hardly anything to 
the effective machinery of analysis, mastery of which should be the first priority. 
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On the introduction. I have not written an introductory survey of the subject, 
since the majority of beginning students already have a preliminary idea of 
differential and integral calculus and their applications from high school, and 
I could hardly claim to write an even more introductory survey. Instead, in the 
first two chapters I bring the former high-school student’s understanding of 
sets, functions, the use of logical symbolism, and the theory of a real number 
to a certain mathematical completeness. 

This material belongs to the formal foundations of analysis and is aimed 
primarily at the mathematics major, who may at some time wish to trace the 
logical structure of the basic concepts and principles used in classical analysis. 
Mathematical analysis proper begins in the third chapter, so that the reader 
who wishes to get effective machinery in his hands as quickly as possible 
and see its applications can in general begin a first reading with Chapter 3, 
turning to the earlier pages whenever something seems nonobvious or raises 
a question which hopefully I also have thought of and answered in the early 
chapters. | 


On the division of material. The material of the two books is divided into 
chapters numbered continuously. The sections are numbered within each 
chapter separately; subsections of a section are numbered only within that 
section. Theorems, propositions, lemmas, definitions, and examples are writ- 
ten in italics for greater logical clarity, and numbered for convenience within 
each section. 


On the supplementary material. Several chapters of the book are written as a 
natural extension of classical analysis. These are, on the one hand, Chapters 
1 and 2 mentioned above, which are devoted to its formal mathematical 
foundations, and on the other hand, Chapters 9, 10, and 15 of the second 
part, which give the modern view of the theory of continuity, differential and 
integral calculus, and finally Chapter 19, which is devoted to certain effective 
asymptotic methods of analysis. 

The question as to which part of the material of these chapters should be 
included in a lecture course depends on the audience and can be decided by 
the lecturer, but certain fundamental concepts introduced here are usually 
present in any exposition of the subject to mathematicians. 

In conclusion, I would like to thank those whose friendly and competent 
professional aid has been valuable and useful to me during the work on this 
book. 

The proposed course was quite detailed, and in many of its aspects it 
was coordinated with subsequent modern university mathematics courses — 
such as, for example, differential equations, differential geometry, the theory 
of functions of a complex variable, and functional analysis. In this regard 
my contacts and discussions with V.I. Arnol’d and the especially numerous 
ones with S. P. Novikov during our joint work with the so-called “experimental 
student group in natural-science/mathematical education” in the Department 
of Mathematics at MSU, were very useful to me. 
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I received much advice from N. V. Efimov, chair of the Section of Math- 
ematical Analysis in the Department of Mechanics and Mathematics at 
Moscow State University. 

I am also grateful to colleagues in the department and the section for 
remarks on the mimeographed edition of my lectures. 

Student transcripts of my recent lectures which were made available to 
me were valuable during the work on this book, and I am grateful to their 
owners. 

I am deeply grateful to the official reviewers L. D. Kudryavtsev, V. P. Pet- 
renko, and S.B.Stechkin for constructive comments, most of which were 
taken into account in the book now offered to the reader. 


Moscow, 1980 V. Zorich 
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1 Some General Mathematical Concepts 
and Notation 


1.1 Logical Symbolism 


1.1.1 Connectives and Brackets © 


The language of this book, like the majority of mathematical texts, consists 
of ordinary language and a number of special symbols from the theories 
being discussed. Along with the special symbols, which will be introduced 
as needed, we use the common symbols of mathematical logic ~, A, V, >, 
and < to denote respectively negation (not) and the logical connectives and, 
or, implies, and is equivalent to.' 

For example, take three statements of independent interest: 


L. If the notation is adapted to the discoveries..., the work of thought is 
marvelously shortened. (G. Leibniz)? 


P. Mathematics is the art of calling different things by the same name. 
(H. Poincaré).° 


G. The great book of nature is written in the language of mathematics. 
(Galileo).4 


Then, according to the notation given above, 


1 The symbol & is often used in logic in place of A. Logicians more often write 
the implication symbol => as — and the relation of logical equivalence as <-> 
or +. However, we shall adhere to the symbolism indicated in the text so as not 
to overburden the symbol —, which has been traditionally used in mathematics 
to denote passage to the limit. 

G. W. Leibniz (1646-1716) -— outstanding German scholar, philosopher, and 
mathematician to whom belongs the honor, along with Newton, of having dis- 
covered the foundations of the infinitesimal calculus. 

H. Poincaré (1854-1912) — French mathematician whose brilliant mind trans- 
formed many areas of mathematics and achieved fundamental applications of it 
in mathematical physics. 
Galileo Galilei (1564-1642) - Italian scholar and outstanding scientific experi- 
menter. His works lie at the foundation of the subsequent physical concepts of 
space and time. He is the father of modern physical science. 


N 


Ww 
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Notation Meaning 
LSP L implies P 
DSF L is equivalent to P 


((L = P) ^A (=P)) = (-L) If P follows from L and P is false, 
then L is false 
~((L & G) V (P & G)) G is not equivalent either to L or to P 


We see that it is not always reasonable to use only formal notation, avoid- 
ing colloquial language. 

We remark further that parentheses are used in the writing of complex 
statements composed of simpler ones, fulfilling the same syntactical function 
as in algebraic expressions. As in algebra, in order to avoid the overuse of 
parentheses one can make a convention about the order of operations. To 
that end, we shall agree on the following order of priorities for the symbols: 


=m A V, >, ©. 


With this convention the expression ~AABVC' = D should be interpreted 
as (((~A) ^A B) VC) = D, and the relation A V B > C as (AV B) = C, not 
as A V (B > C). 

We shall often give a different verbal expression to the notation A => B, 
which means that A implies B, or, what is the same, that B follows from A, 
saying that B is a necessary criterion or necessary condition for A and A in 
turn is a sufficient condition or sufficient criterion for B, so that the relation 
A & B can be read in any of the following ways: 

A is necessary and sufficient for B; 

A hold when B holds, and only then; 

A if and only if B; 

A is equivalent to B. 

Thus the notation A = B means that A implies B and simultaneously B 
implies A. 

The use of the conjunction and in the expression A A^ B requires no ex- 
planation. 

It should be pointed out, however, that in the expression A V B the con- 
junction or is not exclusive, that is, the statement A V B is regarded as true 
if at least one of the statements A and B is true. For example, let x be a 
real number such that z? — 3x + 2 = 0. Then we can write that the following 
relation holds: 


(x? — 3z +2 =0) & (x =1) V (z = 2). 


1.1.2 Remarks on Proofs 


A typical mathematical proposition has the form A = B, where A is the 
assumption and B the conclusion. The proof of such a proposition consists of 
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constructing a chain A > C1 > --->C, => B of implications, each element 
of which is either an axiom or a previously proved proposition.’ 

In proofs we shall adhere to the classical rule of inference: if A is true and 
A => B, then B is also true. 

In proof by contradiction we shall also use the law of excluded middle, 
by virtue of which the statement A V ~A (A or not-A) is considered true 
independently of the specific content of the statement A. Consequently we 
simultaneously accept that —(=A) = A, that is, double negation is equivalent 
to the original statement. 


1.1.3 Some Special Notation 


For the reader’s convenience and to shorten the writing, we shall agree to 
denote the end of a proof by the symbol D. 

We also agree, whenever convenient, to introduce definitions using the 
special symbol := (equality by definition), in which the colon is placed on 
the side of the object being defined. 

For example, the notation 


b 
[fear im of: P,e) 
a 
defines the left-hand side in terms of the right-hand side, whose meaning is 
assumed to be known. 
Similarly, one can introduce abbreviations for expressions already defined. 
For example 


Ie ae Sake) 

i=l 
introduces the notation o(f; P, €) for the sum of special form on the left-hand 
_ side. 


1.1.4 Concluding Remarks 


We note that here we have spoken essentially about notation only, without 
analyzing the formalism of logical deductions and without touching on the 
profound questions of truth, provability, and deducibility, which form the 
subject matter of mathematical logic. 

How are we to construct mathematical analysis if we have no formalization 
of logic? There may be some consolation in the fact that we always know more 
than we can formalize at any given time, or perhaps we should say we know 
how to do more than we can formalize. This last sentence may be clarified by 


5 The notation A = B => C will be used as an abbreviation for (A> B)A(B=>C). 
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the well-known proverb of the centipede who forgot how to walk when asked 
to explain exactly how it dealt with so many legs. 

The experience of all the sciences convinces us that what was consid- 
ered clear or simple and unanalyzable yesterday may be subjected to re- 
examination or made more precise today. Such was the case (and will un- 
doubtedly be the case again) with many concepts of mathematical analysis, 
the most important theorems and machinery of which were discovered in the 
seventeenth and eighteenth centuries, but which acquired its modern formal- 
ized form with a unique interpretation that is probably responsible for its 
being generally accessible, only after the creation of the theory of limits and 
the fully developed theory of real numbers needed for it in the nineteenth 
century. 

This is the level of the theory of real numbers from which we shall begin 
to construct the whole edifice of analysis in Chap. 2. 

As already noted in the preface, those who wish to make a rapid ac- 
quaintance with the basic concepts and effective machinery of differential 
and integral calculus proper may begin immediately with Chap. 3, turning 
to particular places in the first two chapters only as needed. 


1.1.5 Exercises 


We shall denote true assertions by the symbol 1 and false ones by 0. Then to each 
of the statements ~A, AA B, AV B, and A => B one can associate a so-called 
truth table, which indicates its truth or falsehood depending on the truth of the 
statements A and B. These tables are a formal definition of the logical operations 
a, A, V, =>. Here they are: 


1. Check whether all of these tables agree with your concept of the corresponding © 
logical operation. (In particular, pay attention to the fact that if A is false, then 
the implication A = B is always true.) 
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2. Show that the following simple, but very useful relations, which are widely used 
in mathematical reasoning, are true: 

a) A(A A B) 4< AAV AB; 

b) (A V B) = nA AB; 

c) (A = B) & (aB => 377A); 

d) (A = B) & (AVB); 

e) (A> B) & A^nB. 


1.2 Sets and Elementary Operations on them 


1.2.1 The Concept of a Set 


Since the late nineteenth and early twentieth centuries the most universal 
language of mathematics has been the language of set theory. This is even 
manifest in one of the definitions of SER ane as the science that studies 
different structures (relations) on sets.® 

“We take a set to be an assemblage of definite, perfectly distinguishable 
objects of our intuition or our thought into a coherent whole.” Thus did 
Georg Cantor,’ the creator of set theory, describe the concept of a set. 

Cantor’s description cannot, of course, be considered a definition, since it 
appeals to concepts that may be more complicated than the concept of a set 
itself (and in any case, have not been defined previously). The purpose of this 
description is to explain the concept by connecting it with other concepts. 

The basic assumptions of Cantorian (or, as it is generally called, “naive” 
set theory reduce to the following statements. 


1°. A set may consist of any distinguishable objects. 


2°. A set is unambiguously determined by the collection of objects that com- 
prise it. 


_ 3°. Any property defines the set of objects having that property. 


If x is an object, P is a property, and P(x) denotes the assertion that x 
has property P, then the class of objects having the property P is denoted 
{x| P(x)}. The objects that constitute a class or set are called the elements 
of the class or set. 

The set consisting of the elements z1,...,£n is usually denoted 
{£1,..., £n}. Wherever no confusion can arise we allow ourselves to denote 
the one-element set {a} simply as a. 


ê Bourbaki, N. “The architecture of mathematics” in: N. Bourbaki, Elements of the 
history of mathematics, translated from the French by John Meldrum, Springer, 
New York, 1994. 

T G. Cantor (1845-1918) - German mathematician, the creator of the theory of 
infinite sets and the progenitor of set-theoretic language in mathematics. 
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The words “class”, “family”, “totality”, and “collection” are used as syn- 
onyms for “set” in naive set theory. 
The following examples illustrate the application of this terminology: 


— the set of letters “a” occurring in the word “I”; 

— the set of wives of Adam; 

— the collection of ten decimal digits; 

— the family of beans; 

— the set of grains of sand on the Earth; 

— the totality of points of a plane equidistant from two given points of the 
plane; 

— the family of sets; 

— the set of all sets. 


The variety in the possible degree of determinacy in the definition of a 
set leads one to think that a set is, after all, not such a simple and harmless 
concept. 

And in fact the concept of the set of all sets, for example, is simply 
contradictory. 


Proof. Indeed, suppose that for a set M the notation P(M) means that M 
is not an element of itself. 

Consider the class K = {M| P(M)} of sets having property P. 

If K is a set either P(K) or ~P(K) is true. However, this dichotomy does 
not apply to K. Indeed, P(K) is impossible; for it would then follow from 
the definition of K that K contains K as an element, that is, that ~P(K) is 
true; on the other hand, —P(K) is also impossible, since that means that K 
contains K as an element, which contradicts the definition of K as the class 
of sets that do not contain themselves as elements. 

Consequently K is not a set. O 


This is the classical paradox of Russell, one of the paradoxes to which 
the naive conception of a set leads. 

In modern mathematical logic the concept of a set has been subjected to 
detailed analysis (with good reason, as we see). However, we shall not go into 
that analysis. We note only that in the current axiomatic set theories a set 
is defined as a mathematical object having a definite collection of properties. 

The description of these properties constitutes an axiom system. The core 
of axiomatic set theory is the postulation of rules by which new sets can be 
formed from given ones. In general any of the current axiom systems is such 
that, on the one hand, it eliminates the known contradictions of the naive 
theory, and on the other hand it provides freedom to operate with specific 
sets that arise in different areas of mathematics, most of all, in mathematical 
analysis understood in the broad sense of the word. 


8 B., Russell (1872-1970) — British logician, philosopher, sociologist and social ac- 
tivist. | 
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Having confined ourselves for the time being to remarks on the concept of 
a set, we pass to the description of the set-theoretic relations and operations 
most commonly used in analysis. 

Those wishing a more detailed acquaintance with the concept of a set 
should study Subsect. 1.4.2 in the present chapter or turn to the specialized 
literature. 


1.2.2 The Inclusion Relation 


As has already been pointed out, the objects that comprise a set are usually 
called the elements of the set. We tend to denote sets by uppercase letters 
and their elements by the corresponding lowercase letters. 

The statement, “x is an element of the set X” is written briefly as 


rEX (or X 32), 


and its negation as 
x éX (or X Zax). 


When statements about sets are written, frequent use is made of the 
logical operators 3 (“there exists” or “there are” ) and V (“every” or “for any”) 
which are called the existence and generalization quantifiers respectively. 

For example, the string Vx((x € A) & (x € B)) means that for any object 
x the relations x € A and x € B are equivalent. Since a set is completely 
determined by its elements, this statement is usually written briefly as 


A=B, 


read “A equals B”, and means that the sets A and B are the same. 

Thus two sets are equal if they consist of the same elements. 

The negation of equality is usually written as A £ B. 

If every element of A is an element of B, we write A C Bor B D A and 
- say that A is a subset of B or that B contains A or that B includes A. In this 
connection the relation A C B between sets A and B is called the inclusion 
relation (Fig. 1.1). 


Fig. 1.1. 


8 1 Some General Mathematical Concepts and Notation 


Thus 
(A C B) :=Vz((z € A) > (z € B)). 


If A C B and A £ B, we shall say that the inclusion A C B is strict or 
that A is a proper subset of B. 
Using these definitions, we can now conclude that 


(A=B)e(ACB)A(BCA). 
If M is a set, any property P distinguishes in M the subset 
{x € M| P(x)} 


consisting of the elements of M that have the property. 
For example, it is obvious that 


M ={xze€ M|x E€ M}. 


On the other hand, if P is taken as a property that no element of the set M 
has, for example, P(x) := (x # x), we obtain the set 


Ø= {xE M|x £r}, 


called the empty subset of M. 


1.2.3 Elementary Operations on Sets 


Let A and B be subsets of a set M. 
a. The union of A and B is the set 


AUB := {x € M| (x € A) V (z € B)}, 


consisting of precisely the elements of M that belong to at least one of the 
sets A and B (Fig. 1.2). 


b. The intersection of A and B is the set 
ANB :={xzE€ M|(x E€ A)A(zE€B)}, 
formed by the elements of M that belong to both sets A and B (Fig. 1.3). 
c. The difference between A and B is the set 
A\B:={xEM|(rxE A)A^(z ¢B)}, 


consisting of the elements of A that do not belong to B (Fig. 1.4). 

The difference between the set M and one of its subsets A is usually called 
the complement of A in M and denoted Cm A, or CA when the set in which 
the complement of A is being taken is clear from the context (Fig. 1.5). 
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Fig. 1.2. Fig. 1.3. Fig. 1.4. Fig. 1.5. 


Example. As an illustration of the interaction of the concepts just intro- 
duced, let us verify the following relations (the so-called de Morgan? rules): 
Cm(AU B) = CMAN CuB, (1.1) 
Cm(AN B) = CyAUCyB. (1.2) 


Proof. We shall prove the first of these equalities by way of example: 


(x € Cm(AU B)) => (« ¢ (AU B)) > ((x ¢ A) A(x ¢ B)) => 
=> (x E Cy A) A(x E€ CyB) > (z€ (CuANCyB)) i 


Thus we have established that 
Cy(AUB) Cc (CyANCyB). (1.3) 
On the other hand, 


(x € (CuANCmB)) => ((£ € Cu A) A(x € CuB)) = 
= ((c ¢ A) A(x ¢ B)) => (x ¢ (AUB)) d 
=> (x E€ Cm(AU B)) i 


that is, 
(CMAN CmB) C Cm(AU B) : (1.4) 


Equation (1.1) follows from (1.3) and (1.4). O 


d. The direct (Cartesian) product of sets. For any two sets A and B one can 
form a new set, namely the pair {A,B} = {B, A}, which consists of the sets 
A and B and no others. This set has two elements if A Æ B and one element 
if A= B. 

This set is called the unordered pair of sets A and B, to be distinguished 
from the ordered pair (A,B) in which the elements are endowed with ad- 
ditional properties to distinguish the first and second elements of the pair 


° A.de Morgan (1806-1871) — Scottish mathematician. 
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{A,B}. The equality 
(A, B ) a (C, D ) 
between two ordered pairs means by definition that A = C and B = D. In 
particular, if A Æ B, then (A, B) Æ (B, A). 
Now let X and Y be arbitrary sets. The set 


X xY :={(2,y)|(rE X)A(YEY)}, 


formed by the ordered pairs (x, y) whose first element belongs to X and whose 
second element belongs to Y, is called the direct or Cartesian product of the 
sets X and Y (in that order!). 

It follows obviously from the definition of the direct product and the 
remarks made above about the ordered pair that in general X xY Æ Y xX. 
Equality holds only if X = Y. In this last case we abbreviate X x X as X?. 

The direct product is also called the Cartesian product in honor of 
Descartes,!° who arrived at the language of analytic geometry in terms of 
a system of coordinates independently of Fermat.!! The familiar system of 
Cartesian coordinates in the plane makes this plane precisely into the direct 
product of two real axes. This familiar object shows vividly why the Cartesian 
product depends on the order of the factors. For example, different points of 
the plane correspond to the pairs (0,1) and (1,0). 

In the ordered pair z = (£1, £2), which is an element of the direct product 
Z = Xı X Xə of the sets X, and X2, the element xı is called the first projection 
of the pair z and denoted pr,z, while the element x2 is the second projection 
of z and is denoted prgz. 

By analogy with the terminology of analytic geometry, the projections of 
an ordered pair are often called the (first and second) coordinates of the pair. 


1.2.4 Exercises 


In Exercises 1, 2, and 3 the letters A, B, and C denote subsets of a set M. 


1. Verify the following relations. 
a) (ACC)A(BCC)S ((AUB) cc); 
b) (Cc A)A (CC B) $ (Cc (AnB)); 
EN (Cm4) = Á; 


d) (A C Cm B) & (B C CMA); 
e) (A C B) & (CMA D CmB). 


10 R. Descartes (1596-1650) — outstanding French philosopher, mathematician and 
physicist who made fundamental contributions to scientific thought and knowl- 
edge. 

11 P, Fermat (1601-1665) — remarkable French mathematician, a lawyer by profes- 
sion. He was one of the founders of a number of areas of modern mathematics: 
analysis, analytic geometry, probability theory, and number theory.. 
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2. Prove the following statements. 
a) AU(BUC)=(AUB)UC=:AUBUC; 
b) AN(BNC)=(ANB)NC=: ANBNC; 
c) AN(BUC) = (AN B)U(ANC); 
d) AU(BNC) =(AUB)N(AUC). 


3. Verify the connection (duality) between the operations of union and intersection: 
a) Cu(AU B) = CmMANCmB; 
b) Cu (A N B) = CMU CuB. 


4. Give geometric representations of the following Cartesian products. 
a) The product of two line segments (a rectangle). 
b) The product of two lines (a plane). 
c) The product of a line and a circle (an infinite cylindrical surface). 
d) The product of a line and a disk (an infinite solid cylinder). 
e) The product of two circles (a torus). 


f) The product of a circle and a disk (a solid torus). 


5. The set A = {(x1,x£2) € X?| zı = x2} is called the diagonal of the Cartesian 
square X? of the set X. Give geometric representations of the diagonals of the sets 
obtained in parts a), b), and e) of Exercise 4. 


6. Show that 
a) (X x Y = Ø) & (X = Ø) v (Y = Ø), and if X x Y # Ø, then 
b) (AxXxBCXxY)s(ACX)A(BCY), 
G(X xY)U(ZxY)=(XUZ)xY, 
d) (X xY)N(X' x Y’) = (XN X"') x (YNY’). 


Here Ø denotes the empty set, that is, the set having no elements. 


7. By comparing the relations of Exercise 3 with relations a) and b) from Exercise 
2 of Sect. 1.1, establish a correspondence between the logical operators =, A, V and 
. the operations C, N, and U on sets. 


1.3 Functions 


1.3.1 The Concept of a Function (Mapping) 


We shall now describe the concept of a functional relation, which is funda- 
mental both in mathematics and elsewhere. 
=- Let X and Y be certain sets. We say that there is a function defined on 
X with values in Y if, by virtue of some rule f, to each element x € X there 
corresponds an element y € Y. 
In this case the set X is called the domain of definition of the function. 
The symbol x used to denote a general element of the domain is called the 
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argument of the function, or the independent variable. The element yo € Y 
corresponding to a particular value xp € X of the argument x is called the 
value of the function at xo, or the value of the function at the value x = zo 
of its argument, and is denoted f(xg). As the argument xz € X varies, the 
value y = f(x) € Y, in general, varies depending on the values of x. For that 
reason, the quantity y = f(x) is often called the dependent variable. 

The set 


F(X) := {y E€ Y| 3x ((x € X) ^ (y= f(x)))} 


of values assumed by a function on elements of the set X will be called the 
set of values or the range of the function. 

The term “function” has a variety of useful synonyms in different areas 
of mathematics, depending on the nature of the sets X and Y: mapping, 
transformation, morphism, operator, functional. The commonest is mapping, 
and we shall also use it frequently. 

For a function (mapping) the following notations are standard: 


(ae) Xr. 


When it is clear from the context what the domain and range of a function 
are, one also uses the notation x > f(x) or y = f(x), but more frequently a 
function in general is simply denoted by the single symbol f. 

Two functions fı and fo are considered identical or equal if they have the 
same domain X and at each element x € X the values fı(x) and fo(x) are 
the same. In this case we write f; = fo. 

If A C X and f : X > Y is a function, we denote by f|A or fla the 
function y: A — Y that agrees with f on A. More precisely, f|4(x) := y(x) 
if x € A. The function f|, is called the restriction of f to A, and the function 
f:X —Y is called an extension or a continuation of y to X. 

We see that it is sometimes necessary to consider a function y: A + Y 
defined on a subset A of some set X while the range y(A) of y may also 
turn out be a subset of Y that is different from Y. In this connection, we 
sometimes use the term domain of departure of the function to denote any 
set X containing the domain of a function, and domain of arrival to denote 
any subset of Y containing its range. 

Thus, defining a function (mapping) involves specifying a triple (X, Y, f), 
where 

X is the set being mapped, or domain of the function; 

Y is the set into which the mapping goes, or a domain of arrival of the 
function; 

f is the rule according to which a definite element y € Y is assigned to 
each element x € X. 

The asymmetry between X and Y that appears here reflects the fact that 
the mapping goes from X to Y, and not the other direction. 

Now let us consider some examples of functions. 
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Example 1. The formulas l = 2rr and V = Sars establish functional rela- 
tionships between the circumference l of a circle and its radius r and between 
the volume V of a ball and its radius r. Each of these formulas provides 
a particular function f : R} — R+ defined on the set R+, of positive real 
numbers with values in the same set. 


Example 2. Let X be the set of inertial coordinate systems andc: X — R 
the function that assigns to each coordinate system x € X the value c(x) of 
the speed of light in vacuo measured using those coordinates. The function 
c: X — R is constant, that is, for any x € X it has the same value c. (This 
is a fundamental experimental fact.) 


Example 3. The mapping G : R? —> R? (the direct product R? = R x R = 
R: x Rz of the time axis R; and the spatial axis Rz) into itself defined by the 
formulas 


x =2-vt, 


ti =t, 


is the classical Galilean transformation for transition from one inertial coor- 
dinate system (x,t) to another system (z’,t’) that is in motion relative to 
the first at speed v. 

The same purpose is served by the mapping L : R? — R? defined by the 
relations 


,  «&£—uvt 
t = t—(S)ax l 


This is the well-known (one-dimensional) Lorentz!? transformation, which 
: plays a fundamental role in the special theory of relativity. The speed c is the 
speed of light. 


Example 4. The projection pr, : Xı X X2 — Xı defined by the correspon- 


dence Xı x Xo Ð (21,22) es zı € X, is obviously a function. The second 
projection pr. : Xı X X2 — Xə is defined similarly. 


Example 5. Let P(M) be the set of subsets of the set M. To each set 
A € P(M) we assign the set CyA € P(M), that is, the complement to 
A in M. We then obtain a mapping Cy : P(M) —> P(M) of the set P(M) 
into itself. 

12 H, A. Lorentz (1853-1928) — Dutch physicist. He discovered these transformations 


in 1904, and Einstein made crucial use of them when he formulated his special 
theory of relativity in 1905. 
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Example 6. Let E C M. The real-valued function yg : M — R defined on 
the set M by the conditions (xg(x) = 1 if z € E) A (xx(x) = 0 if z € CME) 
is called the characteristic function of the set E. 


Example 7. Let M(X;Y) be the set of mappings of the set X into the set 
Y and Zo a fixed element of X. To any function f € M(X;Y) we assign 
its value f(zo) € Y at the element xg. This relation defines a function F : 
M(X;Y)-— Y. In particular, if Y = R, that is, Y is the set of real numbers, 
then to each function f : X — R the function F : M(X;R) — R assigns 
the number F(f) = f(x). Thus F is a function defined on functions. For 
convenience, such functions are called functionals. 


Example 8. Let I be the set of curves lying on a surface (for example, the 
surface of the earth) and joining two given points of the surface. To each 
curve y € I’ one can assign its length. We then obtain a function F : l — R 
that often needs to be studied in order to find the shortest curve, or as it is 
called, the geodesic between the two given points on the surface. 


Example 9. Consider the set M (R; R) of real-valued functions defined on the 
entire real line R. After fixing a number a € R, we assign to each function 
f € M(R;R) the function fa E€ M(R; R) connected with it by the relation 
falx) = f(x +a). The function fa(x) is usually called the translate or shift 
of the function f by a. The mapping A : M(R; R) — M(R;R) that arises 
in this way is called the translation of shift operator. Thus the operator A is 
defined on functions and its values are also functions f, = A(f). 

This last example might seem artificial if not for the fact that we encounter 


real operators at every turn. Thus, any radio receiver is an operator f man f 
that transforms electromagnetic signals f into acoustic signals f; any of our 
sensory organs is an operator (transformer) with its own domain of definition 
and range of values. 


Example 10. The position of a particle in space is determined by an ordered 
triple of numbers (x,y,z) called its spatial coordinates. The set of all such 
ordered triples can be thought of as the direct product R x R x R = R? of 
three real lines R. 

A particle in motion is located at some point of the space R3 having 
coordinates (x(t), y(t), z(t)) at each instant t of time. Thus the motion of a 
particle can be interpreted as a mapping y : R > RÌ, where R is the time 
axis and R is three-dimensional space. 

If a system consists of n particles, its configuration is defined by the 
position of each of the particles, that is, it is defined by an ordered set 
(£1, Y1, 213, £2, Y2, 223---3 Ln, Yn, Zn) consisting of 3n numbers. The set of all 
such ordered sets is called the configuration space of the system of n parti- 
cles. Consequently, the configuration space of a system of n particles can be 
interpreted as the direct product R? x R? x --- x R? = R3” of n copies of R3. 

To the motion of a system of n particles there corresponds a mapping 
y : R — R?” of the time axis into the configuration space of the system. 
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Example 11. The potential energy U of a mechanical system is connected 
with the mutual positions of the particles of the system, that is, it is deter- 
mined by the configuration that the system has. Let Q be the set of possible 
configurations of a system. This is a certain subset of the configuration space 
of the system. To each position q € Q there corresponds a certain value U (q) 
of the potential energy of the system. Thus the potential energy is a function 
U : Q — R defined on a subset Q of the configuration space with values in 
the domain R of real numbers. 


Example 12. The kinetic energy K of a system of n material particles depends 
on their velocities. The total mechanical energy of the system E, defined as 
E = K + U, that is, the sum of the kinetic and potential energies, thus 
depends on both the configuration q of the system and the set of velocities 
v of its particles. Like the configuration q of the particles in space, the set of 
velocities v, which consists of n three-dimensional vectors, can be defined as 
an ordered set of 3n numbers. The ordered pairs (q, v) corresponding to the 
states of the system form a subset ® in the direct product R3” x R3” = R®, 
called the phase space of the system of n particles (to be distinguished from 
the configuration space R3”). 

The total energy of the system is therefore a function E : 6 + R defined 
on the subset © of the phase space R®” and assuming values in the domain 
R of real numbers. 

In particular, if the system is closed, that is, no external forces are acting 
on it, then by the law of conservation of energy, at each point of the set ® of 
states of the system the function E will have the same value Eo €E R. 


1.3.2 Elementary Classification of Mappings 


When a function f : X — Y is called a mapping, the value f(x) € Y that it 
assumes at the element x € Y is usually called the image of x. 

The image of a set A C X under the mapping f : X — Y is defined as 
the set | 
| f(A) = {y € Y| 3x((x € A) A (y = f(z)))$ 
consisting of the elements of Y that are images of elements of A. 

The set 

f7*(B) := {x € X| f(x) € B} 

consisting of the elements of X whose images belong to B is called the pre- 
image (or complete pre-image) of the set B c Y (Fig. 1.6). 

A mapping f : X — Y is said to be 

surjective (a mapping of X onto Y) if f(X) =Y; 

injective (or an imbedding or injection) if for any elements z1, £2 of X 


(f(x1) = f(w2)) > (z1 = 22) , 


that is, distinct elements have distinct images; 
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Fig. 1.6. 


bijective (or a one-to-one correspondence) if it is both surjective and in- 
jective. 

If the mapping f : X — Y is bijective, that is, it is a one-to-one corre- 
spondence between the elements of the sets X and Y, there naturally arises 
a mapping 

a? a 


defined as follows: if f(x) = y, then f—'(y) = x, that is, to each element 
y € Y one assigns the element x € X whose image under the mapping f is y. 
By the surjectivity of f there exists such an element, and by the injectivity 
of f, it is unique. Hence the mapping f—! is well-defined. This mapping is 
called the inverse of the original mapping f. 

It is clear from the construction of the inverse mapping that f7! : Y + X 
is itself bijective and that its inverse (f~')~’ : X — Y is the same as the 
original mapping f : X >Y. 

Thus the property of two mappings of being inverses is reciprocal: if f~t 
is inverse for f, then f is inverse for f—?. 

We remark that the symbol f~!(B) for the pre-image of a set B C Y 
involves the symbol f-t} for the inverse function; but it should be kept in 
mind that the pre-image of a set is defined for any mapping f : X — Y, even 
if it is not bijective and hence has no inverse. 


1.3.3 Composition of Functions and Mutually Inverse Mappings 


The operation of composition of functions is on the one hand a rich source 
of new functions and on the other hand a way of resolving complex functions 
into simpler ones. 

If the mappings f : X — Y and g : Y — Z are such that one of them (in 
our case g) is defined on the range of the other (f), one can construct a new 
mapping 

gof: X >Z, 
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whose values on elements of the set X are defined by the formula 


(9° f)(x) := g(f(z)) - 


The compound mapping go f so constructed is called the composition of 
the mapping f and the mapping g (in that order!). 

Figure 1.7 illustrates the construction of the composition of the mappings 
f and g. 


Fig. 1.7. 


You have already encountered the composition of mappings many times, 
both in geometry, when studying the composition of rigid motions of the plane 
or space, and in algebra in the study of “complicated” functions obtained by 
composing the simplest elementary functions. 

The operation of composition sometimes has to be carried out several 
times in succession, and in this connection it is useful to note that it is 
associative, that is, 


ho(gof)=(hog)of. 
Proof. Indeed, 


ho (go f)(x) =h((go f)(x)) = A(g(f(x))) = 
= (hog)(f(z)) =((hog) of)(z). o 


This circumstance, as in the case of addition and multiplication of several 
numbers, makes it possible to omit the parentheses that prescribe the order 
of the pairings. 

If all the terms of a composition f,o---o fı are equal to the same function 
f, we abbreviate it to f”. 

It is well known, for example, that the square root of a positive number 
a can be computed by successive approximations using the formula 


1 a 
Putt = ghet g) 
starting from any initial approximation xo > 0. This none other than the suc- 
cessive computation of f” (xo), where f(x) = ¿(x+ £). Such a procedure, in 
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which the value of the function computed at the each step becomes its argu- 
ment at the next step, is called a recursive procedure. Recursive procedures 
are widely used in mathematics. 

We further note that even when both compositions g o f and f og are 
defined, in general 

g°f F fog. 

Indeed, let us take for example the two-element set {a,b} and the 
mappings f : {a,b} — a and g : {a,b} — b. Then it is obvious that 
gof: {a,b} > b while fog: {a,b} > a. 

The mapping f : X — X that assigns to each element of X the element 
itself, that is x ee, x, will be denoted ex and called the identity mapping 
on X. 


Lemma. 
(go f =ex) => (g is surjective) A (f is injective) . 
Proof. Indeed, if f: X ~Y,g:Y ~X,andgof=ex:X —X, then 
X = ex(X) = (90 f(X) = 9(f(X)) c oY) 


and hence g is surjective. 
Further, if xı € X and z2 € X, then 


(x1 A z2) > (ex (z1) # ex(x2)) > ((g © f)(z1) F (g © f)(z2)) > 
=> (g(f(21))) # g(f(£2)) = (f(a1) # f(z2)) , 


and therefore f is injective. O 


Using the operation of composition of mappings one can describe mutually 
inverse mappings. 


Proposition. The mappings f : X > Y and g : Y — X are bijective and 
mutually inverse to each other if and only if go f =ex and fog =ey. 


Proof. By the lemma the simultaneous fulfillment of the conditions g o f = 
ex and fog = ey guarantees the surjectivity and injectivity, that is, the 
bijectivity, of both mappings. 

These same conditions show that y = f(x) if and only if z = g(y). O 


In the preceding discussion we started with an explicit construction of the 
inverse mapping. It follows from the proposition just proved that we could 
have given a less intuitive, yet more symmetric definition of mutually inverse 
mappings as those mappings that satisfy the two conditions g o f = ex and 
fog = ey. (In this connection, see Exercise 6 at the end of this section.) 
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1.3.4 Functions as Relations. The Graph of a Function 


In conclusion we return once again to the concept of a function. We note that 
it has undergone a lengthy and rather complicated evolution. 

The term function first appeared in the years from 1673 to 1692 in works 
of G. Leibniz (in a somewhat narrower sense, to be sure). By the year 1698 
the term had become established in a sense close to the modern one through 
the correspondence between Leibniz and Johann Bernoulli./* (The letter of 
Bernoulli usually cited in this regard dates to that same year.) 

Many great mathematicians have participated in the formation of the 
modern concept of functional dependence. 

A description of a function that is nearly identical to the one given at the 
beginning of this section can be found as early as the work of Euler (mid- 
eighteenth century) who also introduced the notation f(x). By the early 
nineteenth century it had appeared in the textbooks of S. Lacroix!*. A vig- 
orous advocate of this concept of a function was N. I. Lobachevskii!’, who 
noted that “a comprehensive view of theory admits only dependence rela- 
tionships in which the numbers connected with each other are understood as 
if they were given as a single unit.” 1° It is this idea of precise definition of 
the concept of a function that we are about to explain. 

The description of the concept of a function given at the beginning of 
this section is quite dynamic and reflects the essence of the matter. However, 
by modern canons of rigor it cannot be called a definition, since it uses the 
concept of a correspondence, which is equivalent to the concept of a func- 
tion. For the reader’s information we shall show here how the definition of a 
function can be given in the language of set theory. (It is interesting that the 
concept of a relation, to which we are now turning, preceded the concept of 
a function, even for Leibniz.) 


a. Relations 


Definition 1. A relation R is any set of ordered pairs (x, y). 


13 Johann Bernoulli (1667-1748) — one of the early representatives of the distin- 
guished Bernoulli family of Swiss scholars; he studied analysis, geometry and 
mechanics. He was one of the founders of the calculus of variations. He gave the 
first systematic exposition of the differential and integral calculus. 

14 S, F. Lacroix (1765-1843) — French mathematician and educator (professor at the 
École Normale and the Ecole Polytechnique, and member of the Paris Academy 
of Sciences). 

15 N. I. Lobachevskii (1792-1856) — great Russian scholar, to whom belongs the 

credit — shared with the great German scientist C. F. Gauss (1777-1855) and 
the outstanding Hungarian mathematician J. Bólyai (1802-1860) — for having 
discovered the non-Euclidean geometry that bears his name. 

16 Lobachevskii, N.I. Complete Works, Vol. 5, Moscow—Leningrad: Gostekhizdat, 
1951, p. 44 (Russian). 
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The set X of first elements of the ordered pairs that constitute R is called 
the domain of definition of R, and the set Y of second elements of these pairs 
the range of values of R. 

Thus, a relation can be interpreted as a subset R of the direct product 
Xx Y.If X c X’ and Y c Y’, then of course R C X xY c X’ x Y', so 
that a given relation can be defined as a subset of different sets. 

Any set containing the domain of definition of a relation is called a domain 
of departure for that relation. A set containing the region of values is called 
a domain of arrival of the relation. 

Instead of writing (x,y) € R, we often write «Ry and say that x is 
connected with y by the relation R. 

If R cC X?, we say that the relation R is defined on X. 

Let us consider some examples. 


Example 18. The diagonal 
A = {(a,b) € X*|a=b} 


is a subset of X? defining the relation of equality between elements of X. 
Indeed, aAb means that (a,b) € A, that is, a = b. 


Example 14. Let X be the set of lines in a plane. 

Two lines a € X and b € X will be considered to be in the relation R, 
and we shall write aRb, if b is parallel to a. It is clear that this condition 
distinguishes a set R of pairs (a,b) in X? such that aRb. It is known from 
geometry that the relation of parallelism between lines has the following 
properties: 

aRa (reflexivity); 

aRb = bRa (symmetry); 

(aRb) A (bRc) = aRc (transitivity). 


A relation R having the three properties just listed, that is, reflexivity,!” 
symmetry, and transitivity, is usually called an equivalence relation. An equiv- 
alence relation is denoted by the special symbol ~, which in this case replaces 
the letter R. Thus, in the case of an equivalence relation we shall write a ~ b 
instead of aRb and say that a is equivalent to b. 


Example 15. Let M be a set and X = P(M) the set of its subsets. For two 
arbitrary elements a and b of X = P(M), that is, for two subsets a and b of 
M, one of the following three possibilities always holds: a is contained in b; b 
is contained in a; a is not a subset of b and b is not a subset of a. 


17 For the sake of completeness it is useful to note that a relation R is reflexive 
if its domain of definition and its range of values are the same and the relation 
aRa holds for any element a in the domain of R. 
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As an example of a relation R on X?, consider the relation of inclusion 
for subsets of M, that is, make the definition 


aRb := (a C b). 


This relation obviously has the following properties: 
aRa (reflexivity); 
(aRb) A (bRc) => aRc (transitivity); 
(aRb) A (bRa) = ab, that is, a = b (antisymmetry). 


A relation between pairs of elements of a set X having these three prop- 
erties is usually called a partial ordering on X. For a partial ordering relation 
on X, we often write a < b and say that b follows a. 

If the condition 

Vavb((aRb) V (bRa)) 


holds in addition to the last two properties defining a partial ordering relation, 
that is, any two elements of X are comparable, the relation R is called an 
ordering, and the set X with the ordering defined on it is said to be linearly 
ordered. 

The origin of this term comes from the intuitive image of the real line R 
on which a relation a < b holds between any pair of real numbers. 


b. Functions and their graphs. A relation R is said to be functional if 
(Ry) A (rRy2) > (yi = Y2) - 


A functional relation is called a function. 

In particular, if X and Y are two sets, not necessarily distinct, a relation 
R C X xY between elements x of X and y of Y is a functional relation on X 
if for every x € X there exists a unique element y € Y in the given relation 
to x, that is, such that xRy holds. 

Such a functional relation R C X x Y is a mapping from X into Y, or a 
function from X into Y. 
We shall usually denote functions by the letter f. If f is a function, we 


shall write y = f(x) or x ets y, as before, rather than z f y, calling y = f(z) 
the value of f at x or the image of x under f. 

As we now see, assigning an element y € Y “corresponding” to x € X in 
accordance with the “rule” f, as was discussed in the original description of 
the concept of a function, amounts to exhibiting for each x € X the unique 
y € Y such that x f y, that is, (x,y) Ef CX xY. 

The graph of a function f : X — Y, as understood in the original de- 
scription, is the subset I’ of the direct product X x Y whose elements have 
the form (x, f(x)). Thus 


I := { (x,y) € X xYly=f(a)}. 
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In the new description of the concept of a function, in which we define it 
as a subset f C X x Y, of course, there is no longer any difference between 
a function and its graph. 

We have exhibited the theoretical possibility of giving a formal set- 
theoretic definition of a function, which reduces essentially to identifying a 
function and its graph. However, we do not intend to confine ourselves to that 
way of defining a function. At times it is convenient to define a functional 
relation analytically, at other times by giving a table of values, and at still 
other times by giving a verbal description of a process (algorithm) making 
it possible to find the element y € Y corresponding to a given x € X. With 
each method of presenting a function it is meaningful to ask how the function 
could have been defined using its graph. This problem can be stated as the 
problem of constructing the graph of the function. Defining numerical-valued 
functions by a good graphical representation is often useful because it makes 
the basic qualitative properties of the functional relation visualizable. One 
can also use graphs (nomograms) for computations; but, as a rule, only in 
cases where high precision is not required. For precise computations we do 
use the table definition of a function, but more often we use an algorithmic 
definition that can be implemented on a computer. 


1.3.5 Exercises 


1. The composition R2 o Rı of the relations Rı and Rə is defined as follows: 
R20 Ri := { (2, 2)| dy (cRiy A yRoz)} . 

In particular, if Rı C X x Y and R2 C Y x Z, then R = R2 0 Rı C X x Z, and 
IRZ =y ((y E Y) A (Riy) A (yRoz)) 


a) Let Ax be the diagonal of X? and Ay the diagonal of Y*. Show that if the 
relations Ri C X xY and R2 C Y x X are such that (R20 Rı = Ax) A (Ri oR = 
Ay ), then both relations are functional and define mutually inverse mappings of X 
and Y. 

b) Let R C X°’. Show that the condition of transitivity of the relation R is 
equivalent to the condition RoR C R. 

c) The relation R’ C Y x X is called the transpose of the relation R C X x Y 
if (yR'x) = (Ry). 

Show that a relation R C X? is antisymmetric if and only if RNR’ C Ax. 

d) Verify that any two elements of X are connected (in some order) by the 
relation R C X? if and only if RUR’ = X?. 


2. Let f : X — Y bea mapping. The pre-image f~*(y) C X of the element y € Y 
is called the fiber over y. 


a) Find the fibers for the following mappings: 
pr, : X1 X X2 > Xi, pry: X1 X X2 > X2. 
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b) An element xı € X will be considered to be connected with an element 
x2 E€ X by the relation R C X?, and we shall write xı R22 if f(x1) = f(x), that 
is, zı and z2 both lie in the same fiber. 

Verify that R is an equivalence relation. 


c) Show that the fibers of a mapping f : X — Y do not intersect one another 
and that the union of all the fibers is the whole set X. 


d) Verify that any equivalence relation between elements of a set makes it 
possible to represent the set as a union of mutually disjoint equivalence classes of 
elements. 


3. Let f: X — Y bea mapping from X into Y. Show that if A and B are subsets 
of X, then | 


a) (A C B) = (f(A) C f(B)) A (ACB). 


b) (4# Ø) > (f(A) #9), 
c) f(ANB) C f(A) N f(B), 
d) f(AUB) = f(A) U f(B); 
if A’ and B’ are subsets of Y, then 
e) (Ac BY) > (FTA) c fB’), 
f) F(A N B’) = f(A’) n FTB’), 
g) fT (A'U B’) = f(A’) U F'(B’); 
if Y D A’ D B’, then 
h) FTA \ BY) = f(A) \ FTB’), 
i) fT (Cy A’) = Cx f(A’); 
and for any A C X and B’ CY 
j) F1(F(A)) 2 A, 
k) £(f-7(B)) cB’. 
4. Show that the mapping f : X > Y is 
a) surjective if and only if f(f-(B)) = B’ for every set B’ CY; 
b) bijective if and only if 
(uw) = 4) AFF) =) 
for every set A C X and every set B’ CY. 
5. Verify that the following statements about a mapping f : X — Y are equivalent: 
a) f is injective; 
b) f-? (ra) = A for every A C X; 
c) f(AN B) = f(A) N f(B) for any two subsets A and B of X; 


d) f(A) N fF(B) = s ANB= Ø; 
e) f(A \ B) = f(A) \ f(B) whenever X D ADB. 
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6. a) If the mappings f : X + Y and g : Y — X are such that go f = ex, where 
ex is the identity mapping on X, then g is called a left inverse of f and f a right 
inverse of g. Show that, in contrast to the uniqueness of the inverse mapping, there 
may exist many one-sided inverse mappings. 

Consider, for example, the mappings f : X — Y and g : Y — X, where X isa 
one-element set and Y a two-element set, or the mappings of sequences given by 


(Biseda Drees) TER CA ae ae a 
ears Ynse) A ioe ag Yn e) 


b) Let f: X > Y and g : Y — Z be bijective mappings. Show that the mapping 
gof:X > Z is bijective and that (go f)! = fT! og. 
c) Show that the equality 


(go f)-(C) = > (97 (0)) 


holds for any mappings f : X > Y and g : Y — Z and any set C C Z. 


d) Verify that the mapping F : X x Y + Y x X defined by the correspondence 
(x,y) +> (y, x) is bijective. Describe the connection between the graphs of mutually 
inverse mappings f : X => Y and f-':Y > X. 


7. a) Show that for any mapping f : X — Y the mapping F : X + X xY defined 
by the correspondence x > (z. f (z)) is injective. 


b) Suppose a particle is moving at uniform speed on a circle Y; let X be the 


time axis and z 5 y the correspondence between the time x € X and the position 
y = f(x) € Y of the particle. Describe the graph of the function f : X — Y in 
X xY. 


8. a) For each of the examples 1-12 considered in Sect. 1.3 determine whether the 
mapping defined in the example is surjective, injective, or bijective or whether it 
belongs to none of these classes. 

b) Ohm’s law I = V/R connects the current J in a conductor with the potential 
difference V at the ends of the conductor and the resistance R of the conductor. 
Give sets X and Y for which some mapping O : X —> Y corresponds to Ohm’s law. 
What set is the relation corresponding to Ohm’s law a subset of? 


c) Find the mappings G~' and L~' inverse to the Galilean and Lorentz trans- 
formations. 


9. a) A set S C X is stable with respect to a mapping f : X => X if f(S) C S. 
Describe the sets that are stable with respect to a shift of the plane by a given 
vector lying in the plane. 

b) A set I C X is invariant with respect to a mapping f : X => X if f(D) =I. 
Describe the sets that are invariant with respect to rotation of the plane about a 
fixed point. 

c) A point p € X is a fized point of a mapping f : X — X if f(p) = p. Verify 
that any composition of a shift, a rotation, and a similarity transformation of the 
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plane has a fixed point, provided the coefficient of the similarity transformation is 
less than 1. | 

d) Regarding the Galilean and Lorentz transformations as mappings of the 
plane into itself for which the point with coordinates (z,t) maps to the point with 
coordinates (z’,t’), find the invariant sets of these transformations. 


10. Consider the steady flow of a fluid (that is, the velocity at each point of the 
flow does not change over time). In time t a particle at point x of the flow will move 
to some new point f;(x) of space. The mapping x +> f(x) that arises thereby on 
the points of space occupied by the flow depends on time and is called the mapping 
after time t. Show that fi, o ft, = ft, O fta = fti+to and fio f-t = ex. 


1.4 Supplementary Material 


1.4.1 The Cardinality of a Set (Cardinal Numbers) 


The set X is said to be equipollent to the set Y if there exists a bijective 
mapping of X onto Y, that is, a point y € Y is assigned to each x € X, 
the elements of Y assigned to different elements of X are different, and every 
point of Y is assigned to some point of X. 

Speaking fancifully, each element x € X has a seat all to itself in Y, and 
there are no vacant seats y € Y. 

It is clear that the relation XRY thereby introduced is an equivalence 
relation. For that reason we shall write X ~ Y instead of X RY, in accordance 
with our earlier convention. 

The relation of equipollence partitions the collection of all sets into classes 
of mutually equivalent sets. The sets of an equivalence class have the same 
number of elements (they are equipollent), and sets from different equivalence 
classes do not. 

The class to which a set X belongs is called the cardinality of X, and also 
the cardinal or cardinal number of X. It is denoted card X. If X ~ Y, we 
' write card X = card Y. 

The idea behind this construction is that it makes possible a comparison of 
the numbers of elements in sets without resorting to an intermediate count, 
that is, without measuring the number by comparing it with the natural 
numbers N = {1,2,3,...}. Doing the latter, as we shall soon see, is sometimes 
not even theoretically possible. 

The cardinal number of a set X is said to be not larger than the cardinal 
number of a set Y, and we write card X < card Y, if X is equipollent to some 
subset of Y. 

Thus, 


(card X < card Y) := 3Z CY (card X = card Z) . 
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If X c Y, it is clear that card X < card Y. It turns out, however, that 
the relation X C Y does not exclude the inequality card Y < card X, even 
when X is a proper subset of Y. 

For example, the correspondence x > TA is a bijective mapping of the 
interval —1 < x < 1 of the real axis R onto the entire axis. 

The possibility of being equipollent to a proper subset of itself is a charac- 
teristic of infinite sets that Dedekind!® even suggested taking as the definition 
of an infinite set. Thus a set is called finite (in the sense of Dedekind) if it is 
not equipollent to any proper subset of itself; otherwise, it is called infinite. 

Just as the relation of inequality orders the real numbers on a line, the 
inequality just introduced orders the cardinal numbers of sets. To be specific, 
one can prove that the relation just constructed has the following properties: 


1° (card X < card Y) A (card Y < card Z) > (card X < card Z) (obvious). 


2° (card X < cardY) A (cardY < cardX) => (cardX = cardY) (the 
Schröder-Bernstein theorem. !°). 


3° YX VY (card X < card Y) V (card Y < card X) (Cantor’s theorem). 


Thus the class of cardinal numbers is linearly ordered. 

We say that the cardinality of X is less than the cardinality of Y and write 
card X < cardY, if card X < cardY but card X # card Y. Thus (card X < 
card Y) := (card X < card Y) A (card X # card Y). 

As before, let Ø be the empty set and P(X) the set of all subsets of the 
set X. Cantor made the following discovery: 


Theorem. card X < card P(X). 


Proof. The assertion is obvious for the empty set, so that from now on we 
shall assume X # Ø. 

Since P(X) contains all one-element subsets of X, card X < card P(X). 

To prove the theorem it now suffices to show that card X # card P(X) if 
X $Ø. 

Suppose, contrary to the assertion, that there exists a bijective mapping 
f: X — P(X). Consider the set A = {x € X : x ¢ f(x)} consisting of the 
elements x € X that do not belong to the set f(x) € P(X) assigned to them 
by the bijection. Since A € P(X), there exists a € X such that f(a) = A. 
For the element a the relation a € A is impossible by the definition of A, and 
the relation a ¢ A is impossible, also by the definition of A. We have thus 
reached a contradiction with the law of excluded middle. O 


18 R, Dedekind (1831-1916) — German algebraist who took an active part in the 
development of the theory of a real number. He was the first to propose the 
axiomatization of the set of natural numbers usually called the Peano axiom 
system after G. Peano (1858-1932), the Italian mathematician who formulated 
it somewhat later. 

19 F, Bernstein (1878-1956) — German mathematician, a student of G. Cantor. 
E. Schröder (1841-1902) - German mathematician. | 
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This theorem shows in particular that if infinite sets exist, then even 
“infinities” are not all the same. 


1.4.2 Axioms for Set Theory 


The purpose of the present subsection is to give the interested reader a picture of 
an axiom system that describes the properties of the mathematical object called a 
set and to illustrate the simplest consequences of those axioms. 


1°. (Axiom of extensionality) Sets A and B are equal if and only if they 
have the same elements. | 

This means that we ignore all properties of the object known as a “set” except 
the property of having elements. In practice it means that if we wish to establish 


that A = B, we must verify that Vz ((2 EA) S (rE B)). 


2°. (Axiom of separation) To any set A and any property P there corresponds 
a set B whose elements are those elements of A, and only those, having property 
P. 

More briefly, it is asserted that if A is a set, then B = {x € A| P(x)} is also a 
set. 

This axiom is used very frequently in mathematical constructions, when we 
select from a set the subset consisting of the elements having some property. 

For example, it follows from the axiom of separation that there exists an empty 
subset x = {x € X| x Æ x} in any set X. By virtue of the axiom of extensionality 
we conclude that x = Øy for all sets X and Y, that is, the empty set is unique. 
We denote this set by Ø. 

It also follows from the axiom of separation that if A and B are sets, then 
A\ B= {x € A|x ¢ B} is also a set. In particular, if M is a set and A a subset of 
M, then Cm A is also a set. 


3°. (Union axiom) For any set M whose elements are sets there exists a set 
(JM, called the union of M and consisting of those elements and only those that 
belong to some element of M. 

If we use the phrase “family of sets” instead of “a set whose elements are sets”, 
=- the axiom of union assumes a more familiar sound: there exists a set consisting of 
the elements of the sets in the family. Thus, a union of sets is a set, and xz € |] M & 


ax ((x EM)A(zE x)). 
When we take account of the axiom of separation, the union axiom makes it 
possible to define the intersection of the set M (or family of sets) as the set 


NM := fz e UJ m|yx ((X € M) > (we X))}. 


4°. (Pairing axiom) For any sets X and Y there ezists a set Z such that X and 
Y are its only elements. 

The set Z is denoted {X,Y} and is called the unordered pair of sets X and Y. 
The set Z consists of one element if X = Y. 
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As we have already pointed out, the ordered pair (X,Y) differs from the un- 
ordered pair by the presence of some property possessed by one of the sets in the 
pair. For example, (X,Y) := {{X, X}, {X, Y}}. 

Thus, the unordered pair makes it possible to introduce the ordered pair, and 
the ordered pair makes it possible to introduce the direct product of sets by using 
the axiom of separation and the following important axiom. 

5°. (Power set axiom) For any set X there exists a set P(X) having each 
subset of X as an element, and having no other elements. 

In short, there exists a set consisting of all the subsets of a given set. 

We can now verify that the ordered pairs (x, y), where x € X and y € Y, really 
do form a set, namely 


xe {pe P(P(X) UP(Y))| (p= (x,y)) A(w@EX)A(ye y)} 


Axioms 1°-5° limit the possibility of forming new sets. Thus, by Cantor’s the- 
orem (which asserts that card X < card P(X)) there is an element in the set P(X) 
that does not belong to X. Therefore the “set of all sets” does not exist. And it 
was precisely on this “set” that Russell’s paradox was based. 

In order to state the next axiom we introduce the concept of the successor X* 
of the set X. By definition Xt = X U {X}. More briefly, the one-element set {X} 
is adjoined to X. 

Further, a set is called inductive if the empty set is one of its elements and the 
successor of each of its elements also belongs to it. 


6°. (Axiom of infinity) There exist inductive sets. 

When we take Axioms 1°—4° into account, the axiom of infinity makes it possible 
to construct a standard model of the set No of natural numbers (in the sense of 
von Neumann),7? by defining No as the intersection of all inductive sets, that is, 
the smallest inductive set. ‘The elements of No are 


Ø, ot =9U{o}={o}, {o}* ={o}U{{o}}...., 


which are a model for what we denote by the symbols 0,1, 2,... and call the natural 
numbers. 


7°. (Axiom of replacement) Let F(x,y) be a statement (more precisely, a 
formula) such that for every xo in the set X there exists a unique object yo such 
that F (xo, yo) is true. Then the objects y for which there exists an element x € X 
such that F(x,y) is true form a set. | 

We shall make no use of this axiom in our construction of analysis. 

Axioms 1°-7° constitute the axiom system known as the Zermelo-Fraenkel ax- 
ioms.”? 

To this system another axiom is usually added, one that is independent of 
Axioms 1°-7° and used very frequently in analysis. 


20 J. von Neumann (1903-1957) — American mathematician who worked in func- 
tional analysis, the mathematical foundations of quantum mechanics, topological 
groups, game theory, and mathematical logic. He was one of the leaders in the 
creation of the first computers. 

21 E. Zermelo (1871-1953) — German mathematician. A. Fraenkel (1891-1965) — 
German (later, Israeli) mathematician. 
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8°. (Axiom of choice) For any family of nonempty sets there exists a set C 
such that for each set X in the family X NC consists of exactly one element. 

In other words, from each set of the family one can choose exactly one repre- 
sentative in such a way that the representatives chosen form a set C. 

The axiom of choice, known as Zermelo’s axiom in mathematics, has been the 
subject of heated debates among specialists. 


1.4.3 Remarks on the Structure of Mathematical Propositions 
and Their Expression in the Language of Set Theory 


In the language of set theory there are two basic, or atomic types of mathe- 
matical statements: the assertion x € A, that an object x is an element of a 
set A, and the assertion A = B, that the sets A and B are identical. (However, 
when the axiom of extensionality is taken into account, the second statement 
is a combination of statements of the first type: (x € A) & (x € B).) 

A complex statement or logical formula can be constructed from atomic 
statements by means of logical operators — the connectors ~, A, V => and the 
quantifiers V, J — by use of parentheses ( ). When this is done, the formation 
of any statement, no matter how complicated, reduces to carrying out the 
following elementary logical operations: 

a) forming a new statement by placing the negation sign before some 
statement and enclosing the result in parentheses; 

b) forming a new statement by substituting the necessary connectors ^, 
V, and = between two statements and enclosing the result in parentheses. 

c) forming the statement “for every object x property P holds,” (written 
as Vz P(x)) or the statement “there exists an object x having property P” 
(written as Jx P(x)). 

For example, the cumbersome expression 


da (P(x) A (Yy (P(y) > (y= x)))) 


means that there exists an object having property P and such that if y is 
-= any object having this property, then y = x. In brief: there exists a unique 
object x having property P. This statement is usually written 4!” P(x), and 
we Shall use this abbreviation. 

To simplify the writing of a statement, as already pointed out, one at- 
tempts to omit as many parentheses as possible while retaining the unambigu- 
ous interpretation of the statement. To this end, in addition to the priority 
of the operators ~, ^A, V, = mentioned earlier, we assume that the symbols in 
a formula are most strongly connected by the symbols €, =, then a, V, and 
then the connectors ~, A, V, >. 

Taking account of this convention, we can now write 


de P(x) = 3e (Ple) AVy (PU) > y=2)). 
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We also make the following widely used abbreviations: 


(Va € X)P := Yz (x € X > P(z)), 
(Ax € X)P := Ar(teE XA P(z)), 
(Vz >a)P := Va(rxE RAx>a=> P(z)), 
Ge >a) P = 3s (rE RA San PG). 


Here R, as always, denotes the set of real numbers. 

Taking account of these abbreviations and the rules a), b), c) for con- 
structing complex statements, we can, for example, give an unambiguous 
expression 
( lim f(x) = a) := Ve > 0 3ô > 0 Yx € R (0 < |z — a| < 6 = |f (x) — Al < €) 
of the fact that the number A is the limit of the function f : R > R at the 
point a E€ R. 

For us perhaps the most important result of what has been said in this 
subsection will be the rules for forming the negation of a statement containing 
quantifiers. 

The negation of the statement “for some z, P(x) is true” means that “for 
any x, P(x) is false”, while the negation of the statement “for any x, P(x) is 


true” means that “there exists an x such that P(x) is false”. 
Thus, 


=e Plr) S VarP (2): 


Wa P(x) & dr aP(z). 
We recall also (see the exercises in Sect. 1.1) that 


APAQ) S =~PV-@Q, 
AEP VOS =~PA-Q, 
APSO S PNQ.: 
On the basis of what has just been said, one can conclude, for example, 


that 
~((Vx > a) P) & (Ix >a) -P . 


It would of course be wrong to express the right-hand side of this last relation 
as (Jx < a) =P. 
Indeed, 
~((Yz > a) P) := ~(Vz (x E RA z >a > P(z))) e 
@ Jxa(z ERAT >a > P(2))) e 
S dz ((x e RAT >a) ARa) =: (Ar > a) =P. 
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If we take into acount the structure of an arbitrary statement mentioned 
above, we can now use the negations just constructed for the simplest state- 
ments to form the negation of any particular statement. 

For example, 


=(lim f(z) = A)  3e > 0V6 > 032 € R 
(0 < |x -a| <dA|f(x)-— Al >e). 


The practical importance of the rule for forming a negation is connected, 
in particular, with the method of proof by contradiction, in which the truth 
of a statement P is deduced from the fact that the statement —P is false. 


1.4.4 Exercises 


1. a) Prove the equipollence of the closed interval {x € R|0 < x < 1} and the open 
interval {x € R|O < x < 1} of the real line R both using the Schröder-Bernstein 
theorem and by direct exhibition of a suitable bijection. . 


b) Analyze the following proof of the Schréder—Bernstein theorem: 
(card X < card Y) A (cardY < card X) > (card X =cardY). 


Proof. It suffices to prove that if the sets X, Y, and Z are such that X DY DZ 
and card X = card Z, then card X = cardY. Let f : X — Z be a bijection. A 
bijection g : X — Y can be defined, for example, as follows: | 


_ jf f(x), ifae f°(X)\ f"(Y) for somen EN, 
g(x) = x otherwise. 


Here f” = fo---of is the nth iteration of the mapping f and N is the set of 
natural numbers. O 


2. a) Starting from the definition of a pair, verify that the definition of the direct 
product X xY of sets X and Y given in Subsect. 1.4.2 is unambiguous, that is, the 


set P(P(X )U P(Y)) contains all ordered pairs (x,y) in which x € X and y EY. 
b) Show that the mappings f : X — Y from one given set X into another given 
set Y themselves form a set M(X,Y). 


c) Verify that if R is a set of ordered pairs (that is, a relation), then the first 
elements of the pairs belonging to R (like the second elements) form a set. 


3. a) Using the axioms of extensionality, pairing, separation, union, and infinity, 
verify that the following statements hold for the elements of the set No of natural 
numbers in the sense of von Neumann: 


1? z =y > zt =y"; 
2° (Yx € No) (xt # Ø); 
Pata=ytoare=y; 
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4° (Va € No) (£ # Ø = (Ay € No) (£ =y*)). 

b) Using the fact that No is an inductive set, show that the following statements 
hold for any of its elements x and y (which in turn are themselves sets): 

1° card < card zt: 

2° card Ø < card r”; 

3° cardz < cardy  cardat < cardy’; 

4° cardz < cardz”:; 

5° cardx < cardy = cardat < card y; 

6° x = y & card x = card y; 

7? (x Cy) V(zDy). 

c) Show that in any subset X of No there exists a (minimal) element £m such 


that (Vz € X)(card£m < card zx). (If you have difficulty doing so, come back to 
this problem after reading Chapter 2.) 


4. We shall deal only with sets. Since a set consisting of different elements may 


itself be an element of another set, logicians usually denote all sets by uppercase 
letters. In the present exercise, it is very convenient to do so. 


a) Verify that the statement 
VadyVz (z EySdu(zewAwe z)) 


expresses the axiom of union, according to which y is the union of the sets belonging 
to x. 


b) State which axioms of set theory are represented by the following statements: 


Va Vy Vz ((zerezey)ear=y), 


Va Vy 3z Vv (veze (v=rvv=y)) , 


Va Jy Yz (zey vu (uez>uez)), 


a (Vy(-dz(z€y) = y € £) A Yw (w € z = 
> Vu (w(veu e (v=wvvew)) = u € £))) . 
c) Verify that the formula 
Vz(zeEf=> (az: Jyı (xt ETAY EYAZ= (x1,m)))) A 
A YT1 (z: E= Jy Jz (in E yY Az = (z1, y1) AZE f)) A 
A V21Vyi Vy2 (az Jz2 (z1 E f A z2 E f Azı = (£1, Y1) A 


A z2 = (z2, y2)) = yı = ya) 
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imposes three successive restrictions on the set f: f is a subset of x xy; the projection 
of f on x is equal to zx; to each element xı of x there corresponds exactly one yı in 
y such that (11, y1) E f. 

Thus what we have here is a definition of a mapping f : x —> y. 


This example shows yet again that the formal expression of a statement is by no 
means always the shortest and most transparent in comparison with its expression 
in ordinary language. Taking this circumstance into account, we shall henceforth 
use logical symbolism only to the extent that it seems useful to us to achieve greater 
compactness or clarity of exposition. 


5. Let f: X — Y bea mapping. Write the logical negation of each of the following 
statements: | 


a) f is surjective; 
b) f is injective; 
c) f is bijective. 


6. Let X and Y be sets and f C X x Y. Write what it means to say that the set 
f is not a function. 


2 The Real Numbers 


Mathematical theories, as a rule, find uses because they make it possible to 
transform one set of numbers (the initial data) into another set of numbers 
constituting the intermediate or final purpose of the computations. For that 
reason numerical-valued functions occupy a special place in mathematics and 
its applications. These functions (more precisely, the so-called differentiable 
functions) constitute the main object of study of classical analysis. But, as 
you may already have sensed from your school experience, and as will soon be 
confirmed, any description of the properties of these functions that is at all 
complete from the point of view of modern mathematics is impossible with- 
out a precise definition of the set of real numbers, on which these functions 
operate. 

Numbers in mathematics are like time in physics: everyone knows what 
they are, and only experts find them hard to understand. This is one of the 
basic mathematical abstractions, which seems destined to undergo significant 
further development. A very full separate course could be devoted to this sub- 
ject. At present we intend only to unify what is basically already known to 
the reader about real numbers from high school, exhibiting as axioms the 
fundamental and independent properties of numbers. In doing this, our pur- 
pose is to give a precise definition of real numbers suitable for subsequent 
mathematical use, paying particular attention to their property of complete- 
ness or continuity, which contains the germ of the idea of passage to the limit 
. — the basic nonarithmetical operation of analysis. 


2.1 The Axiom System and some General Properties 
of the Set of Real Numbers 


2.1.1 Definition of the Set of Real Numbers 


Definition 1. A set R is called the set of real numbers and its elements are 
real numbers if the following list of conditions holds, called the axiom system 
of the real numbers. 
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(I) AXIOMS FOR ADDITION 


An operation 
+:RxR-OR, 


(the operation of addition) is defined, assigning to each ordered pair (x,y) of 
elements x,y of R a certain element x + y € R, called the sum of x and y. 
This operation satisfies the following conditions: 


1,. There exists a neutral, or identity element 0 (called zero) such that 
r+0=0+2=2 


for every x ER. 
2,. For every element x E€ R there exists an element —x E€ R called the 


negative of x such that 


x +(—x) =(-r)+2=0. 


3+. The operation + is associative, that is, the relation 
z+(y+z)=(x+y)+z2 


holds for any elements x,y,z of R. 


4,. The operation + is commutative, that is, 
t+y= ytt 


for any elements x,y of R. 


If an operation is defined on a set G satisfying axioms 14, 24, and 34, 
we say that a group structure is defined on G or that G is a group. If the 
operation is called addition, the group is called an additive group. If it is also 
known that the operation is commutative, that is, condition 44 holds, the 
group is called commutative or Abelian.' 

Thus, Axioms 1,—4, assert that R is an additive abelian group. 


(II) AXIOMS FOR MULTIPLICATION 


An operation 
e:-RxROR, 


(the operation of multiplication) is defined, assigning to each ordered pair 
(x,y) of elements x,y of R a certain element x -y € R, called the product of 
x and y. This operation satisfies the following conditions: 


1 N. H. Abel (1802-1829) — outstanding Norwegian mathematician, who proved 
that the general algebraic equation of degree higher than four cannot be solved 
by radicals. 
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le- There exists a neutral, or identity element 1 € R\ O0 (called one) such 
that 


for every x ER. 


2.. For every element x € R \ 0 there exists an element x~' € R, called 
the inverse or reciprocal of x, such that 


3e. The operation e is associative, that is, the relation 


a-(y-z)=(x-y)-z 
holds for any elements x,y,z of R. 


4e. The operation e is commutative, that is, 
Ly=y-@ 
for any elements x,y of R. 
We remark that with respect to the operation of multiplication the set 
R \ 0, as one can verify, is a (multiplicative) group. 
(I, II) THE CONNECTION BETWEEN ADDITION AND MULTIPLICATION 


Multiplication is distributive with respect to addition, that is 
(x + y)z = zz +yz 


for all x,y,z E R. 
We remark that by the commutativity of multiplication, this equality 
continues to hold if the order of the factors is reversed on either side. 
If two operations satisfying these axioms are defined on a set G, then G 
is called a field. 
(III) ORDER AXIOMS 


Between elements of R there is a relation <, that is, for elements x,y € R 
one can determine whether x < y or not. Here the following conditions must 
hold: 


0<. Yx E R (z < 1). 

l<. (x < y) A (y < z) > (£ = y). 
2<. (£ < y) A (y < z) > (z < 2). 
3<. Vx E RYy E R (z < y) V (y < 2). 
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The relation < on R is called inequality. 

A set on which there is a relation between pairs of elements satisfying 
axioms 0<, 1<, and 2<, as you know, is said to be partially ordered. If in 
addition axiom 3< holds, that is, any two elements are comparable, the set 
is linearly ordered. Thus the set of real numbers is linearly ordered by the 
relation of inequality between elements. 


(I, III) THE CONNECTION BETWEEN ADDITION AND ORDER ON R 


If x,y,z are elements of R, then 


(x<y) => (@+2z<y4+2). 


(II, III) THE CONNECTION BETWEEN MULTIPLICATION AND ORDER ON R 


If x and y are elements of R, then 


(O<a)AO<y)>(OSa-y). 


(IV) THE AXIOM OF COMPLETENESS (CONTINUITY) 


If X and Y are nonempty subsets of R having the property that x < y for 
every x E X and every y E€ Y, then there exists c E€ R such thatr<c<y 
forallx E€ X andyEeY. 


We now have a complete list of axioms such that any set on which these 
axioms hold can be considered a concrete realization or model of the real 
numbers. 

This definition does not formally require any preliminary knowledge about 
numbers, and from it “by turning on mathematical thought” we should, again 
formally, obtain as theorems all the other properties of real numbers. On the 
subject of this axiomatic formalism we would like to make a few informal 
remarks. 

Imagine that you had not passed from the stage of adding apples, cubes, 
or other named quantities to the addition of abstract natural numbers; you 
had not studied the measurement of line segments and arrived at rational 
numbers; you did not know the great discovery of the ancients that the diag- 
onal of a square is incommensurable with its side, so that its length cannot 
be a rational number, that is, that irrational numbers are needed; you did not 
have the concept of “greater” or “smaller” that arises in the process of mea- 
surement; you did not picture order to yourself using, for example, the real 
line. If all these preliminaries had not occurred, the axioms just listed would 
not be perceived as the outcome of intellectual progress; they would seem at 
the very least a strange, and in any case arbitrary, fruit of the imagination. 

In relation to any abstract system of axioms, at least two questions arise 
immediately. 


2.1 Axioms and Properties of Real Numbers 39 


First, are these axioms consistent? That is, does there exist a set satisfying 
all the conditions just listed? This is the problem of consistency of the axioms. 

Second, does the given system of axioms determine the mathematical 
object uniquely’? That is, as the logicians would say, is the axiom system 
categorical? Here uniqueness must be understood as follows. If two people 
A and B construct models independently, say of number systems Ry, and 
Rp, satisfying the axioms, then a bijective correspondence can be established 
between the systems R4 and Rpg, say f : Ra — Rp, preserving the arithmetic 
operations and the order, that is, 


flat+ty) = f(z)+ fy), 


f(x-y) = f(x): fly), 
r<y & f(x) < fly). 


In this case, from the mathematical point of view, R4 and Rg are merely 
distinct but equally valid realizations (models) of the real numbers (for ex- 
ample, Ra might be the set of infinite decimal fractions and Rg the set of 
points on the real line). Such realizations are said to be isomorphic and the 
mapping f is called an isomorphism. The result of this mathematical activ- 
ity is thus not about any particular realization, but about each model in the 
class of isomorphic models of the given axiom system. 

We shall not discuss the questions posed above, but instead confine our- 
selves to giving informative answers to them. 

A positive answer to the question of consistency of an axiom system is 
always of a hypothetical nature. In relation to numbers it has the following 
appearance: Starting from the axioms of set theory that we have accepted 
(see Subsect. 1.4.2), one can construct the set of natural numbers, then the 
set of rational numbers, and finally the set R of real numbers satisfying all 
the properties listed. 

The question of the categoricity of the axiom system for the real numbers 
can be established. Those who wish to do so may obtain it independently by 
. solving Exercises 23 and 24 at the end of this section. 


2.1.2 Some General Algebraic Properties of Real Numbers 


We shall show by examples how the known properties of numbers can be 
obtained from these axioms. 


a. Consequences of the Addition Axioms 1°. There is only one zero in 
-the set of real numbers. 


Proof. If 0; and O2 are both zeros in R, then by definition of zero, 


0; = 0; +02 = 0 +01 =02. O 


40 2 The Real Numbers 
20. Each element of the set of real numbers has a unique negative. 


Proof. If xı and x2 are both negatives of x € R, then 
Ly = zı +0 = z1 + (z + 22) = (xı +r) + 2X =0+z2=z2. O 


Here we have used successively the definition of zero, the definition of the 
negative, the associativity of addition, again the definition of the negative, 
and finally, again the definition of zero. 


30. In the set of real numbers R the equation 
a+z=b 

has the unique solution 

x=b+(-a). 
Proof. This follows from the existence and uniqueness of the negative of every 
element a € R: 

(a+ x =b) & ((x +a) +(-a) =b+(-a)) & 
& (x+ (a+ (—a)) =b + (-a)) & (z +0 =b + (-a)) © 
& (x =b+ (—a)). 0 


The expression b + (—a) can also be written as b — a. This is the shorter 
and more common way of writing it, to which we shall adhere. 


b. Consequences of the Multiplication Axioms 1°. There is only one 
multiplicative unit in the real numbers. 


2°. For each x #0 there is only one reciprocal x~'. 
3°. Fora € R\ O, the equation a- x = b has the unique solution x =b-a~. 


The proofs of these propositions, of course, merely repeat the proofs of the 
corresponding propositions for addition (except for a change in the symbol 
and the name of the operation); they are therefore omitted. 


c. Consequences of the Axiom Connecting Addition and Multi- 
plication Applying the additional axiom (I, II) connecting addition and 
multiplication, we obtain further consequences. 


1°. For anyx E€ R 


Proof. 


(x-O=2-(0+0)=2-0+2-0) > (-0=2-0+ (—(x-0)) =0). o 
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From this result, incidentally, one can see that if x € R\0, then z~! € R\0. 
20. (x-y=0)=>(z=0)v(y=0). 


Proof. If, for example, y Æ 0, then by the uniqueness of the solution of the 
equation x-y = 0 for z, we find x = 0- y7} =0. O 


3°. For anyx € R 
7 = (1) x. 


Proof. x+ (—1) -x = (14+ (—1)) -£ = 0-x = x-0 = 0, and the assertion now 
follows from the uniqueness of the negative of a number. O 


4°. For anyxeER 


(—1)(-a) =c. 
Proof. This follows from 3° and the uniqueness of the negative of —z. O 


5°. For anyxeER 
(“zj (=r =a a 


Proof. 
(—x)(—z) = ((-1) - 2)(—2) = (æ - (-1))(-2) = 2((-1)(-2)) = 2-2. 


Here we have made successive use of the preceding propositions and the 
commutativity and associativity of multiplication. O 


d. Consequences of the Order Axioms We begin by noting that the 
relation x < y (read “x is less than or equal to y”) can also be written as 
y > x (“y is greater than or equal to x”); when x Æ y, the relation z < y is 
written x < y (read “zx is less than y”) or y > x (read “y is greater than x”), 
and is called strict inequality. 


1°. For any x and y in R precisely one of the following relations holds: 
L<y, =y, E 


Proof. This follows from the definition of strict inequality just given and 
axioms l< and 3<. O 


2°. For any x,y,z ER 


(£ < y)A (y < 2) > (æ < z), 
(z <y) Aly <z) => (a@<2). 
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Proof. We prove the first assertion as an example. By Axiom 2<, which as- 
serts that the inequality relation is transitive, we have 


(z <y) A (y < z) & (z < y) A (y < z) A (y # 2) > (z < 2). 


It remains to be verified that x Æ z. But if this were not the case, we would 
have 


(£ <y) A (y <z) & (z <y)A ly <z) (z <y)A (ly < z)A (y #2). 


By Axiom 1< this relation would imply 


(y =z) Aly £z), 


which is a contradiction. O 


e. Consequences of the Axioms Connecting Order with Addition 
and Multiplication If in addition to the axioms of addition, multiplication, 
and order, we use axioms (I, IIT) and (II, III), which connect the order with the 
arithmetic operations, we can obtain, for example, the following propositions. 


1°. For any x,y,z,w E€ R 
(x <y) > (z +z) < (y +2), 
(0< x) > (—xz <0), 
(z <y) Al <w) = (z+z)< (y +w), 
(a<y)A(z<w) > (z+z<y+w). 


Proof. We shall verify the first of these assertions. 
By definition of strict inequality and the axiom (I,III) we have 


(x < y) > (z <y) > (£+ 2) < (y +2). 
It remains to be verified that x + z Æ y + z. Indeed, 
((z +2) = (y +z)) > (x = (y +z)- z =y+(z-z)=y), 
which contradicts the assumption z < y. O 
2°. If x,y,z E R, then 


(0< x) A^A(0< y) > (0< zxy), 
(x < 0)A^(y <0) > (0< zxy), 
(x <0) (0< y) => (ty <0), 
(x <y) A (0< z) > (zz < yz), 
(x <y) A(z <0) => (yz < zz). 
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Proof. We shall verify the first of these assertions. By definition of strict 
inequality and the axiom (II,III) we have 


(0< x)A(0<y)=>(0<zx)^(0<y)= (OK< zy). 
Moreover, 0 Æ xy since, as already shown, 
(z -y =0) > (x =0) V (y=0). 
Let us further verify, for example, the third assertion: 
(x<O0AO<ys>O0<-r)AO0<y)s>. 


=> (0 < (-2z)-y)=> (0 < ((—1) . r)y) => 
= (0 < (-1)- (zy)) > (0 < —(zy)) = (zy < 0) .0 


The reader is now invited to prove the remaining relations independently 
and also to verify that if nonstrict inequality holds in one of the parentheses 
on the left-hand side, then the inequality on the right-hand side will also be 
nonstrict. 


39, 0<1. 


Proof. We know that 1 € R \ 0, that is 0 Æ 1. If we assume 1 < 0, then by 
what was just proved, 


1<0)AQ<0)S (0<1-1)53(0<1). 


But we know that for any pair of numbers x,y € R exactly one of the possi- 
bilities x < y, x = y, x > y actually holds. Since 0 Æ 1 and the assumption 
1 < 0 implies the relation 0 < 1, which contradicts it, the only remaining 
possibility is the one in the statement of the proposition. O 


4°, (0< x)= (0< x7!) and (0 < x) A(x < y) > (0 < y7t) A (y7! <2"). 


Proof. Let us verify the first of these assertions. First of all, s7! #4 0. As- 
suming z7! < 0, we obtain 


(x71! <0) A (0 < z) => (x-x7™} <0) => (1<0). 
This contradiction completes the proof. O 


We recall that numbers larger than zero are called positive and those less 
than zero negative. 

Thus we have shown, for example, that 1 is a positive number, that the 
product of a positive and a negative number is a negative number, and that 
the reciprocal of a positive number is also positive. 
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2.1.3 The Completeness Axiom and the Existence 
of a Least Upper (or Greatest Lower) Bound of a Set of Numbers 


Definition 2. A set X C R is said to be bounded above (resp. bounded below) 
if there exists a number c € R such that x < c (resp. c < x) for all x € X. 


The number c in this case is called an upper bound (resp. lower bound) of 
the set X. It is also called a majorant (resp. minorant) of X. 


Definition 3. A set that is bounded both above and below is called bounded. 


Definition 4. An element a € X is called the largest or maximal (resp. 
smallest or minimal) element of X if x < a (resp. a < x) for all x € X. 


We now introduce some notation and at the same time give a formal 
expression to the definition of maximal and minimal elements: 


(a= maxX) := (aE XAVtE X (x <a)), 
(a = min X) := (a€ XAVZE X (a < x)). 


Along with the notation max X (read “the maximum of X”) and min X 
(read “the minimum of X”) we also use the respective expressions max x and 
. LE 
min 2. 


X 
It follows immediately from the order axiom 1< that if there is a maximal 
(resp. minimal) element in a set of numbers, it is the only one. 
However, not every set, not even every bounded set, has a maximal or 
minimal element. 
For example, the set X = {x € R|O < x < 1} has a minimal element. 
But, as one can easily verify, it has no maximal element. 


Definition 5. The smallest number that bounds a set X C R from above 
is called the least upper bound (or the exact upper bound) of X and denoted 


sup X (read “the supremum of X”) or sup z. 
xrEX 


This is the basic concept of the present subsection. Thus 
(s = sup X) := Va € X ((x < s) A (Ys < s da’ € X (s' < x'))). 


The expression in the first set of parentheses on the right-hand side here 
says that s is an upper bound for X; the expression in the second set says that 
s is the smallest number having this property. More precisely, the expression 
in the second set of parentheses asserts that any number smaller than s is 
not an upper bound of X. 

The concept of the greatest lower bound (or exact lower bound) of a set 
X is introduced similarly as the largest of the lower bounds of X. 
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Definition 6. 
(i = inf X) := Yz € X ((i < x) A (Wi >i de’ € X (x' <7’))). 


Along with the notation inf X (read “the infimum of X”) one also uses the 


notation inf x for the greatest lower bound of X. 
rE 


Thus we have given the following definitions: 
sup X := min {c € R|Vz E€ X (x< ¢)}, 
inf X := max {c € R| Yz € X (c < <x)} . 


But we said above that not every set has a minimal or maximal element. 
Therefore the definitions we have adopted for the least upper bound and 
greatest lower bound require an argument, provided by the following lemma. 


Lemma. (The least upper bound principle). Every nonempty set of real num- 
bers that is bounded from above has a unique least upper bound. 


Proof. Since we already know that the minimal element of a set of numbers 
is unique, we need only verify that the least upper bound exists. 

Let X C R be a given set and Y = {y € R|Yx € X (z < y)}. By 
hypothesis, X # @ and Y # @. Then, by the completeness axiom there 
exists c € R such that Vz € X Vy € Y (x < c < y). The number c is therefore 
both a majorant of X and a minorant of Y. Being a majorant of X, c is an 
element of Y. But then, as a minorant of Y, it must be the minimal element 
of Y. Thus c = min Y = supX. O 


Naturally the existence and uniqueness of the greatest lower bound of a 
set of numbers that is bounded from below is analogous, that is, the following 
proposition holds. 


Lemma. (X bounded below) => (2! inf X). 

We shall not take time to give the proof. 

_ We now return to the set X = {x € R|O < x < 1}. By the lemma just 
proved it must have a least upper bound. By the very definition of the set X 
and the definition of the least upper bound, it is obvious that sup X < 1. 

To prove that sup X = 1 it is thus necessary to verify that for any number 
q < 1 there exists x € X such that q < x; simply put, this means merely 
that there are numbers between q and 1. This of course, is also easy to prove 
independently (for example, by showing that q < 271(q+1) < 1), but we shall 
not do so at this point, since such questions will be discussed systematically 
and in detail in the next section. 

As for the greatest lower bound, it always coincides with the minimal 
element of a set, if such an element exists. Thus, from this consideration 
alone we have inf X = 0 in the present example. 

Other, more substantive examples of the use of the concepts introduced 
here will be encountered in the next section. 


46 2 The Real Numbers 


2.2 The Most Important Classes of Real Numbers 
and Computational Aspects of Operations 
with Real Numbers 


2.2.1 The Natural Numbers and the Principle 
of Mathematical Induction 


a. Definition of the Set of Natural Numbers The numbers of the form 
1,1+1, (1+ 1) +1, and so forth are denoted respectively by 1,2,3,... and 
so forth and are called natural numbers. 

Such a definition will be meaningful only to one who already has a com- 
plete picture of the natural numbers, including the notation for them, for 
example in the decimal system of computation. 

The continuation of such a process is by no means always unique, so that 
the ubiquitous “and so forth” actually requires a clarification provided by — 
the fundamental principle of mathematical induction. 


Definition 1. A set X C R is inductive if for each number x € X, it also 
contains x + 1. 


For example, R is an inductive set; the set of positive numbers is also 
inductive. 


The intersection X = () Xa of any family of inductive sets Xa, if not 
acA 
empty, is an inductive set. 


Indeed, 


(sex Q Xa) > Woe alee Xe)) = 


QEA 


=> (va € A ((x +1) E€ Xa)) > (e+e N x=] . 


acA 
We now adopt the following definition. 


Definition 2. The set of natural numbers is the smallest inductive set con- 
taining 1, that is, the intersection of all inductive sets that contain 1. 


The set of natural numbers is denoted N; its elements are called natural 
numbers. 

From the set-theoretic point of view it might be more rational to begin 
the natural numbers with 0, that is, to introduce the set of natural numbers 
as the smallest inductive set containing 0; however, it is more convenient for 
us to begin numbering with 1. 

The following fundamental and widely used principle is a direct corollary 
of the definition of the set of natural numbers. 
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b. The Principle of Mathematical Induction If a subset E of the set of 
natural numbers N is such that 1 € E and together with each number z € E, 
the number x + 1 also belongs to E, then E =N. 

Thus, | 


(ECN)A(LEB)A (Wee E(te E> (e+1)€£))SE=N. 


Let us illustrate this principle in action by using it to prove several useful 
properties of the natural numbers that we will be using constantly from now 
on. 


1°. The sum and product of natural numbers are natural numbers. 


Proof. Let m,n € N; we shall show that (m+n) € N. We denote by E the set 
of natural numbers n for which (m + n) € N for all m € N. Then 1 € E since 
(m € N) > ((m+1) € N) for any m € N. If n € E, that is, (m+n) € N, then 
(n+ 1) € E also, since (m + (n + 1)) = ((m +n) + 1) € N. By the principle 
of induction, & = N, and we have proved that addition does not lead outside 
of N. 

Similarly, taking E to be the set of natural numbers n for which (m-n) € N 
for all m € N, we find that 1 € E, since m-1 = m, and if n € E, that is, 
m-n € N, then m-(n+1) = mn+m is the sum of two natural numbers, which 
belongs to N by what was just proved above. Thus (n € E) > ((n+1) € E), 
and so by the principle of induction E&E = N. O 


20. (nEN)A(n41)=> ((n—-1) EN). 


Proof. Consider the set E consisting of all real numbers of the form n — 1, 
where n is a natural number different from 1; we shall show that E = N. 
Since 1 € N, it follows that 2 := (1+ 1) € N and hence 1 = (2-1) € E. 
If m € E, then m = n — 1, where n € N; then m+ 1 = (n+ 1) —- 1, 
and since n + 1 € N, we have (m +1) € E. By the principle of induction we 
conclude that H=N. O 


3°. For any n € N the set {x € N|n < x} contains a minimal element, 
namely 
min{z EN|n <z} =n+1. 


Proof. We shall show that the set Æ of n € N for which the assertion holds 
coincides with N. 
We first verify that 1 € E, that is, 


min{z E N| 1 <z} =2. 
We shall also verify this assertion by the principle of induction. Let 


M = {z € N| (z = 1) v (2 < z)} . 
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By definition of M we have 1 € M. Then if x € M, either x = 1, in which 
case x + 1 = 2 € M, or else 2 < z, and then 2 < (x + 1), and once again 
(x+1)ec M. Thus M =N, and hence if (x # 1) A (x € N), then 2 < zx, that 
is, indeed min{x € N| 1 < z} = 2. Hence 1 € E. 

We now show that if n € E, then (n+ 1) € E. 

We begin by remarking that if x € {x e N|n +1 < z}, then 


(z -1)=y€ {y EN|n <y}. 


For, by what has already been proved, every natural number is at least as 
large as 1; therefore (n + 1 < x) => (1 < x) = (x Æ 1), and then by the 
assertion in 2° we have (x — 1) =y EN. 

Now let n € E, that is, min{y € N| n < y} = n+1. Then r—1 > y > n+1 


and x > n + 2. Hence, 
(ze{xeEN|n+1< z})=>(£>n+2) 


and consequently, min{z € N|n +1 < z} =n +2, that is, (n+ 1) € E. 
By the principle of induction E = N, and 3° is now proved. O 


As immediate corollaries of 2° and 3° above, we obtain the following 
properties (4°, 5°, and 6°) of the natural numbers. 


4°. (mEN)A(nEN)A(n<m)=> (n+1<™m). 


5°. The number (n+1) € N is the immediate successor of the number n € N; 
that is, if n € N, there are no natural numbers x satisfyingn<x<n+1. 


6°. If n € N andn Æ 1, then (n — 1) € N and (n — 1) is the immediate 
predecessor of n in N; that is, if n € N, there are no natural numbers x 
satisfyingn—-—l<a<n. 

We now prove one more property of the set of natural numbers. 


7°. In any nonempty subset of the set of natural numbers there is a minimal 
element. 


Proof. Let M CN. If 1 € M, then min M = 1, since Vn € N(1 < n). 

Now suppose 1 ¢ M, that is, 1 € E = N \ M. The set E must contain a 
natural number n such that all natural numbers not larger than n belong to 
E, but (n+ 1) € M. If there were no such n, the set E C N, which contains 
1, would contain along with each of its elements n, the number (n + 1) also; 
by the principle of induction, it would therefore equal N. But the latter is 
impossible, since N\ E = MF Ø. 

The number (n + 1) so found must be the smallest element of M, since 
there are no natural numbers between n and n+ 1, as we have seen. O 
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2.2.2 Rational and Irrational Numbers 


a. The Integers 


Definition 3. The union of the set of natural numbers, the set of negatives 
of natural numbers, and zero is called the set of integers and is denoted Z. 


Since, as has already been proved, addition and multiplication of natural 
numbers do not take us outside N, it follows that these same operations on 
integers do not lead outside of Z. 


Proof. Indeed, if m,n € Z, either one of these numbers is zero, and then the 
sum m +n equals the other number, so that (m+n) E€ Zandm-n=0€Z, 
or both numbers are non-zero. In the latter case, either m,n € N and then 
(m+n) € Nc Zand (m-n) € N C Z, or (—m),(—n) € N and then 
m-n = ((—1)m)((-1)n) € N or (—m),n € N and then (—m - n) € N, that 
is, m-n € Z, or, finally, m,—n € N and then (—m-n) € N and once again 
m ncez. O 


Thus Z is an Abelian group with respect to addition. With respect to 
multiplication Z is not a group, nor is Z \ 0, since the reciprocals of the 
integers are not in Z (except the reciprocals of 1 and —1). 


Proof. Indeed, if m € Z and m Æ 0,1, then assuming first that m € N, we 
have 0 < 1 < m, and, since m- m7t = 1 > 0, we must have 0 < m7! < 1 
(see the consequences of the order axioms in the previous subsection). Thus 
m~! ¢ Z. The case when m is a negative integer different from —1 reduces 
immediately to the one already considered. O 


When k = m-n7! € Z for two integers m,n € Z, that is, when m = k-n 
for some k € Z, we say that m is divisible by n or a multiple of n, or that n 
is a divisor of m. 

The divisibility of integers reduces immediately via suitable sign changes, 
that is, through multiplication by —1 when necessary, to the divisibility of 
the corresponding natural numbers. In this context it is studied in number 
theory. 

We recall without proof the so-called fundamental theorem of arithmetic, 
which we shall use in studying certain examples. 

A number p €E N, p Æ 1, is prime if it has no divisors in N except 1 and p. 


The fundamental theorem of arithmetic. Each natural number admits 
a representation as a product 


n = pi ’''- Pk, 


where pı,...,pk are prime numbers. This representation is unique except for 
the order of the factors. 
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Numbers m,n € Z are said to be relatively prime if they have no common 
divisors except 1 and —1. 

It follows in particular from this theorem that if the product m-n of 
relatively prime numbers m and n is divisible by a prime p, then one of the 
two numbers is also divisible by p. 


b. The Rational Numbers 


Definition 4. Numbers of the form m-n7~!, where m,n € Z, are called 
rational. 


We denote the set of rational numbers by Q. 

Thus, the ordered pair (m,n) of integers defines the rational number 
q=m-n t ifn #0. 

The number q = m - n7t} can also be written as a quotient? of m and n, 
that is, as a so-called rational fraction %. 

The rules you learned in school for operating with rational numbers in 
terms of their representation as fractions follow immediately from the defi- 
nition of a rational number and the axioms for real numbers. In particular, 
“the value of a fraction is unchanged when both numerator and denominator 
are multiplied by the same non-zero integer”, that is, the fractions mk and 
™ represent the same rational number. In fact, since (nk)(k~*n~") = 1, that 
is (n-k)~' = k7! . n71, we have (mk)(nk)~! = (mk)(k-1n7!) =m-n7?. 

Thus the different ordered pairs (m,n) and (mk,nk) define the same 
rational number. Consequently, after suitable reductions, any rational number 
can be presented as an ordered pair of relatively prime integers. 

On the other hand, if the pairs (m1, nı) and (mz2,n2) define the same 
rational number, that is, mı - ny? = Mə ns”; then Mino = Moənı, and if, 
for example, mı and nı are relatively prime, it follows from the corollary 
of the fundamental theorem of arithmetic mentioned above that nz - nī! = 
mo:my' =keZ. 

We have thus demonstrated that two ordered pairs (m1, n1) and (m2, n2) 
define the same rational number if and only if they are proportional. That 
is, there exists an integer k € Z such that, for example, mz = km, and 
n = kni. 


1 


c. The Irrational Numbers 
Definition 5. The real numbers that are not rational are called irrational. 
The classical example of an irrational real number is v2, that is, the 


number s € R such that s > 0 and s? = 2. By the Pythagorean theorem, the 


2 The notation Q comes from the first letter of the English word quotient, which 
in turn comes from the Latin quota, meaning the unit part of something, and 
quot, meaning how many. 
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irrationality of v2 is equivalent to the assertion that the diagonal and side 
of a square are incommensurable. 

Thus we begin by verifying that there exists a real number s € R whose 
square equals 2, and then that s ¢ Q. 


Proof. Let X and Y be the sets of positive real numbers such that Vz € 
X (x? < 2), Vy € Y (2 < y?). Since 1 € X and 2 € Y, it follows that X and 
Y are nonempty sets. 

Further, since (x < y) & (x? < y?) for positive numbers x and y, every 
element of X is less than every element of Y. By the completeness axiom 
there exists s € R such that x < s < y for all x € X and aly EY. 

We shall show that 32 = 2. 

If s2 < 2, then, for example, the number s + n 
s, would have a square less than 2. Indeed, we know that 1 € X, so that 
1? < s2? < 2, and 0 < A := 2 — 3? < 1. It follows that 


Aa A /Ay? A A 
pe DO — 2. 2 — 2 — = 2 = ?., 
(+2) 3° + 35 +(=) <s*+3 qa 8 +3 36 ss +A 


Consequently, (s + = á) € X, which is inconsistent with the inequality x < s 
for all x € X. 

If 2 < s*, then the number s — = which is smaller than s, would have 
a square larger than 2. Indeed, we know that 2 € Y, so that 2 < s2 < 2? or 
0<A:=s?-2<3and0< â < 1. Hence, 


AV o5 A 74x2? A 
oF ys — ey gee = 2 E ey E) 
(: = i N D 3s | l 


and we have now contradicted the fact that s is 2 lower bound of Y. 

Thus the only remaining possibility is that s? = 2. 

Let us show, finally, that s ¢ Q. ou. oe sEQ and let Z be an 
irreducible representation of s. Then m? = 2- n?, so that m? is divisible by 2 
and therefore m also is divisible by 2. But, if m = 2k, then 2k? = n?, and for 
the same reason, n must be divisible by 2. But this contradicts the assumed 
irreducibility of the fraction 7. O 


We have worked hard just now to prove that there exist irrational num- 
bers. We shall soon see that in a certain sense nearly all real numbers are 
irrational. It will be shown that the cardinality of the set of irrational num- 
bers is larger than that of the set of rational numbers and that in fact the 
former equals the cardinality of the set of real numbers. 

Among the irrational numbers we make a further distinction between the 
so-called algebraic irrational numbers and the transcendental numbers. 

A real number is called algebraic if it is the root of an algebraic equation 


aoz” +-+++an-12 +a, = 0 


with rational (or equivalently, integer) cofficients. 
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Otherwise the number is called transcendental. 

We shall see that the cardinality of the set of algebraic numbers is the 
same as that of the set of rational numbers, while the cardinality of the set 
of transcendental numbers is the same as that of the set of real numbers. 
For that reason the difficulties involved in exhibiting specific transcendental 
numbers — more precisely, proving that a given number is transcendental — 
seem at first sight paradoxical and unnatural. 

For example, it was not proved until 1882 that the classical geometric 
number 7 is transcendental,? and one of the famous Hilbert? problems was 
to prove the transcendence of the number af, where a is algebraic, (a > 
0) A (a Æ 1) and £ is an irrational algebraic number (for example, a = 2, 


B= V2). 


2.2.3 The Principle of Archimedes 


We now turn to the principle of Archimedes,’ which is important in both its 
theoretical aspect and the application of numbers in measurement and com- 
putations. We shall prove it using the completeness axiom (more precisely, 
the least-upper-bound principle, which is equivalent to the completeness ax- 
iom). In other axiom systems for the real numbers this fundamental principle 
is frequently included in the list of axioms. 

We remark that the propositions that we have proved up to now about the 
natural numbers and the integers have made no use at all of the complete- 
ness axiom. As will be seen below, the principle of Archimedes essentially 
reflects the properties of the natural numbers and integers connected with 
completeness. We begin with these properties. 


3 The number 7 equals the ratio of the circumference of a circle to its diameter 
in Euclidean geometry. That is the reason this number has been conventionally 
denoted since the eighteenth century, following Euler by m, which is the initial 
letter of the Greek word mepipépta — periphery (circumference). The transcen- 
dence of m was proved by the German mathematician F. Lindemann (1852-1939). 
It follows in particular from the transcendence of m that it is impossible to con- 
struct a line segment of length m with compass and straightedge (the problem 
of rectification of the circle), and also that the ancient problem of squaring the 
circle cannot be solved with compass and straightedge. 

D. Hilbert (1862-1943) — outstanding German mathematician who stated 23 
problems from different areas of mathematics at the 1900 International Congress 
of Mathematicians in Paris. These problems came to be known as the “Hilbert 
problems”. The problem mentioned here (Hilbert’s seventh problem) was given 
an affirmative answer in 1934 by the Soviet mathematician A. O. Gel’fond (1906— 
1968) and the German mathematician T. Schneider (1911-1989). 

Archimedes (287-212 BCE) — brilliant Greek scholar, about whom Leibniz, one 
of the founders of analysis said, “When you study the works of Archimedes, you 
cease to be amazed by the achievements of modern mathematicians.” 


A 


On 
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1°. Any nonempty subset of natural numbers that is bounded from above con- 
tains a maximal element. 


Proof. If E C N is the subset in question, then by the least-upper-bound 
lemma, J!sup E = s € R. By definition of the least upper bound there is 
a natural number n € E satisfying the condition s — 1 < n < s. But then, 
n = max E, since a natural number that is larger than n must be at least 
n+l,andn+1>s. O 


Corollaries 2°. The set of natural numbers is not bounded above. 


Proof. Otherwise there would exist a maximal natural number. But n < n+1. 
O 


3°. Any nonempty subset of the integers that is bounded from above contains 
a maximal element. 


Proof. The proof of 1° can be repeated verbatim, replacing N with Z. O 


4°. Any nonempty subset of integers that is bounded below contains a minimal 
element. 


Proof. One can, for example, repeat the proof of 1°, replacing N by Z and 
using the greatest-lower-bound principle instead of the least-upper-bound 
principle. 

Alternatively, one can pass to the negatives of the numbers (“change 
signs”) and use what has been proved in 3°. O 


5°. The set of integers is unbounded above and unbounded below. 
Proof. This follows from 3° and 4°, or directly from 2°. O 


We can now state the principle of Archimedes. 


6°. (The principle of Archimedes). For any fixed positive number h and 
any real number x there exists a unique integer k such that (k—1)h < x < kh. 


Proof. Since Z is not bounded above, the set {n E Z| < n} is a nonempty 
subset of the integers that is bounded below. Then (see 4°) it contains a 
minimal element k, that is (k — 1) < x/h < k. Since h > 0, these inequalities 
are equivalent to those given in the statement of the principle of Archimedes. 
The uniqueness of k € Z satisfying these two inequalities follows from the 
uniqueness of the minimal element of a set of numbers (see Subsect. 2.1-3). 
O 


And now some corollaries: 


7°. For any positive number £ there exists a natural number n such that 
0 < + =< ©, 
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Proof. By the principle of Archimedes there exists n € Z such that 1 < €-n. 
Since 0 < 1 and 0 < £, we have 0 < n. Thus n € N and 0 < = <e. o 


8°. If the number x € R is such that 0 < x and z < Ł for all n € N, then 
T= 


Proof. The relation 0 < z is impossible by virtue of 7°. O 


9°. For any numbers a,b € R such that a < b there is a rational number 
r €Q such thata<r<b. 


Proof. Taking account of 7°, we choose n € N such that 0 <a 1 < b—a. By the 
principle of Archimedes we can find a number m £ Z such that "a l <a<” me 
Then @ < b, since otherwise we would have @=4 < a < b < Œ, from which 
it would follow that 2 > b—a. Thus r= ™ €Q anda<™<b. oO 


10°. For any number x € R there exists a unique integer k € Z such that 
k<a<k+1. 


Proof. This follows immediately from the principle of Archimedes. O 


The number k just mentioned is denoted [z] and is called the integer part 
of x. The quantity {x} := x — |x] is called the fractional part of x. Thus 
x = |x] + {x}, and {x} > 0. 


2.2.4 The Geometric Interpretation of the Set of Real Numbers 
and Computational Aspects of Operations with Real Numbers 


a. The Real Line In relation to real numbers we often use a descriptive 
geometric language connected with a fact that you know in general terms 
from school. By the axioms of geometry there is a one-to-one correspondence 
f : L — R between the points of a line L and the set R of real numbers. 
Moreover this correspondence is connected with the rigid motions of the line. 
To be specific, if T is a parallel translation of the line L along itself, there 
exists a number t € R (depending only on T) such that f(T(x)) = f(x) +t 
for each point x € L. 

The number f(x) corresponding to a soni x € L is called the coordinate of 
x. In view of the one-to-one nature of the mapping f : L — R, the coordinate 
of a point is often called simply a point. For example, instead of the phrase 
“let us take the point whose coordinate is 1” we say “let us take the point 1”. 
Given the correspondence f : L — R, we call the line L the coordinate axis 
or the number axis or the real line. Because f is bijective, the set R itself is 
also often called the real line and its points are called points of the real line. 

As noted above, the bijective mapping f : L — R that defines coordinates 
on L has the property that under a parallel translation T the coordinates of 
the images of points of the line L differ from the coordinates of the points 
themselves by a number t € R, the same for every point. For this reason f 
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is determined completely by specifying the point that is to have coordinate 
0 and the point that is to have coordinate 1, or more briefly, by the point 0, 
called the origin, and the point 1. The closed interval determined by these 
points is called the unit interval. The direction determined by the ray with 
origin at 0 containing 1 is called the positive direction and a motion in that 
direction (from 0 to 1) is called a motion from left to right. In accordance 
with this convention, 1 lies to the right of 0 and 0 to the left of 1. 

Under a parallel translation T that moves the origin x9 to the point 
£1 = T (xo) with coordinate 1, the coordinates of the images of all points are 
one unit larger than those of their pre-images, and therefore we locate the 
point x2 = T(xı) with coordinate 2, the point 73 = T(x2) with coordinate 
3,..., and the point £n+1 = T (£n) with coordinate n+ 1, as well as the point 
x_, = T! (zo) with coordinate —1,..., the point x_,_1 = T7! (x—n) with 
coordinate —n — 1. In this way we obtain all points with integer coordinates 
m EZ. 

Knowing how to double, triple,... the unit interval, we can use Thales’ 
theorem to partition this interval into n congruent subintervals. By taking 
the subinterval having an endpoint at the origin, we find that the coordinate 
of its other end, which we denote by x, satisfies the equation n- x = 1, that 
is, = L, From this we find all points with rational coordinates % € Q. 

But there still remain points of L, since we know there are intervals in- 
commensurable with the unit interval. Each such point, like every other point 
of the line, divides the line into two rays, on each of which there are points 
with integer or rational coordinates. (This is a consequence of the original 
geometric principle of Archimedes.) Thus a point produces a partition, or, as 
it is called, a cut of Q into two nonempty sets X and Y corresponding to the 
rational points (points with rational coordinates) on the left-hand and right- 
hand rays. By the axiom of completeness, there is a number c that separates 
X and Y, that is, x < c < y for all x € X and all y € Y. Since XUY = Q, it 
follows that sup X = s = i = inf Y. For otherwise, s < i and there would be a 
rational number between s and 2 lying neither in X nor in Y. Thus s = îi = c. 
This uniquely determined number c is assigned to the corresponding point of 
the line. 

The assignment of coordinates to points of the line just described provides 
a visualizable model for both the order relation in R (hence the term “linear 
ordering”) and for the axiom of completeness or continuity in R, which in 
geometric language means that there are no “holes” in the line L, which would 
separate it into two pieces having no points in common. (Such a separation 
could only come about by use of some point of the line L.) 

We shall not go into further detail about the construction of the mapping 
f : L — R, since we shall invoke the geometric interpretation of the set of 
real numbers only for the sake of visualizability and perhaps to bring into 
play the reader’s very useful geometric intuition. As for the formal proofs, 
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just as before, they will rely either on the collection of facts we have obtained 
from the axioms for the real numbers or directly on the axioms themselves. 

Geometric language, however, will be used constantly. 

We now introduce the following notation and terminology for the number 
sets listed below: 

Ja, b[:= {x E€ R| a < x < b} is the open interval ab; 

[a, b] := {x € R| a < x < b} is the closed interval ab; 

Ja, b] := {x € R| a < x < b} is the half-open interval ab containing b; 

la, b|:= {x € R|a < x < b} is the half-open interval ab containing a. 


Definition 6. Open, closed, and half-open intervals are called numerical in- 
tervals or simply intervals. The numbers determining an interval are called 
its endpoints. 


The quantity b—a is called the length of the interval ab. If I is an interval, 
we shall denote its length by |I|. (The origin of this notation will soon become 
_ clear.) 

The sets 


Jla, too[:= {xE R|a <z}, ]—oco,d[:= {xz € R|x <b} 
la, too[:= {x € R|a < z}, ] — œ,b] := {x E R| x < b} 


and | — oo, +o0|:= R are conventionally called unbounded intervals or infinite 
intervals. 

In accordance with this use of the symbols +oo (read “plus infinity”) 
and —oo (read “minus infinity”) it is customary to denote the fact that the 
numerical set X is not bounded above (resp. below), by writing sup X = +00 
(inf X = —oo). 


Definition 7. An open interval containing the point x € R will be called a 
neighborhood of this point. 


In particular, when 6 > 0, the open interval |x — 6,x + 6[ is called the 
d-neighborhood of x. Its length is 20. 

The distance between points x,y € R is measured by the length of the 
interval having them as endpoints. | 

So as not to have to investigate which of the points is “left” and which is 
“right”, that is, whether x < y or y < x and whether the length is y — x or 
x — y, we can use the useful function 


x whenz>O, 
jz} =< 0 whenz=0, 
—x whenz <0, 


which is called the modulus or absolute value of the number. 


Definition 8. The distance between x,y € R is the quantity |x — yl. 
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The distance is nonnegative and equals zero only when the points x and 
y are the same. The distance from zx to y is the same as the distance from y 
to x, since |x — y| = |y — z|. Finally, if z € R, then |x — y| < |z — z| + |z — yl. 
That is, the so-called triangle inequality holds. 

The triangle inequality follows from a property of the absolute value that 
is also called the triangle inequality (since it can be obtained from the pre- 
ceding triangle inequality by setting z = 0 and replacing y by —y). To be 
specific, the inequality 

|z +y| < |x| + ly| 


holds for any numbers x and y, and equality holds only when the numbers x 
and y are both negative or both positive. 


Proof. If 0 < x and O0 < y, then 0 < z +y, |z + y| = z +y, |z| = x, and 
ly| = y, so that equality holds in this case. 

If x <0 and y < 0, then x+y < 0, |z +y| = — (z +y) = -z —y, |x| = —2, 
|y] = —y, and again we have equality. 

Now suppose one of the numbers is negative and the other positive, for 
example, x < 0 < y. Then either xr <x +y <0or0<x+y< y. In the first 
case |x + y| < |z|, and in the second case |x + y| < |y|, so that in both cases 
|z +y| < |z| + ly] O 


Using the principle of induction, one can verify that 
eita a a aaa 


and equality holds if and only if the numbers gz1,..., £n are all nonnegative 
or all nonpositive. 
The number atb is often called the midpoint or center of the interval with 
endpoints a and b since it is equidistant from the endpoints of the interval. 
In particular, a point x € R is the center of its -neighborhood |z—6,x+6| 
and all points of the -neighborhood lie at a distance from zx less than ô. 


b. Defining a Number by Successive Approximations In measuring a 
real physical quantity, we obtain a number that, as a rule, changes when the 
measurement is repeated, especially if one changes either the method of mak- 
ing the measurement or the instrument used. Thus the result of measurement 
is usually an approximate value of the quantity being sought. The quality or 
precision of a measurement is characterized, for example, by the magnitude 
of the possible discrepancy between the true value of the quantity and the 
value obtained for it by measurement. When this is done, it may happen 
that we can never exhibit the exact value of the quantity (if it exists theo- 
retically). Taking a more constructive position, however, we may (or should) 
consider that we know the desired quantity completely if we can measure it 
with any preassigned precision. Taking this position is tantamount to identi- 
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fying the number with a sequence® of more and more precise approximations 
by numbers obtained from measurement. But every measurement is a finite 
set of comparisons with some standard or with a part of the standard com- 
mensurable with it, so that the result of the measurement will necessarily be 
expressed in terms of natural numbers, integers, or, more generally, rational 
numbers. Hence theoretically the whole set of real numbers can be described 
in terms of sequences of rational numbers by constructing, after due analysis, 
a mathematical copy or, better expressed, a model of what people do with 
numbers who have no notion of their axiomatic description. The latter add 
and multiply the approximate values rather than the values being measured, 
which are unknown to them. (To be sure, they do not always know how to 
say what relation the result of these operations has to the result that would 
be obtained if the computations were carried out with the exact values. We 
shall discuss this question below.) 

Having identified a number with a sequence of approximations to it, we 
should then, for example, add the sequences of approximate values when we 
wish to add two numbers. The new sequence thus obtained must be regarded 
as a new number, called the sum of the first two. But is it a number? The sub- 
tlety of the question resides in the fact that not every randomly constructed 
sequence is the sequence of arbitrarily precise approximations to some quan- 
tity. That is, one still has to learn how to determine from the sequence itself 
whether it represents some number or not. Another question that arises in 
the attempt to make a mathematical copy of operations with approximate 
numbers is that different sequences may be approximating sequences for the 
same quantity. The relation between sequences of approximations defining 
a number and the numbers themselves is approximately the same as that 
between a point on a map and an arrow on the map indicating the point. 
The arrow determines the point, but the point determines only the tip of the 
arrow, and does not exclude the use of a different arrow that may happen to 
be more convenient. 

A precise description of these problems was given by Cauchy,’ who carried 
out the entire program of constructing a model of the real numbers, which we 
have only sketched. One may hope that after you study the theory of limits 
you will be able to repeat these constructions independently of Cauchy. 

What has been said up to now, of course, makes no claim to mathematical 
rigor. The purpose of this informal digression has been to direct the reader’s 
attention to the theoretical possibility that more than one natural model of 
the real numbers may exist. I have also tried to give a picture of the relation 


6 If n is the number of the measurement and £n the result of that measurement, 
the correspondence n +> Zn is simply a function f : N > R of a natural-number 
argument, that is, by definition a sequence (in this case a sequence of numbers). 
Section 3.1 is devoted to a detailed study of numerical sequences. 

T A. Cauchy (1789-1857) — French mathematician, one of the most active creators 
of the language of mathematics and the machinery of classical analysis. 
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of numbers to the world around us and to clarify the fundamental role of 
natural and rational numbers. Finally, I wished to show that approximate 
computations are both natural and necessary. 

The next part of the present section is devoted to simple but important 
estimates of the errors that arise in arithmetic operations on approximate 
quantities. These estimates will be used below and are of independent interest. 

We now give precise statements. 


Definition 9. If x is the exact value of a quantity and Z a known approxi- 
mation to the quantity, the numbers 


A(ž) := |z — | 


and AG) 
L 
6(£) := —— 
|ž| 
are called respectively the absolute and relative error of approximation by 7. 
The relative error is not defined when x = 0. 


Since the value x is unknown, the values of A(z) and 6(%) are also un- 
known. However, one usually knows some upper bounds A(z) < A and 
6(£) < 6 for these quantities. In this case we say that the absolute or relative 
error does not exceed A or 6 respectively. In practice we need to deal only 
with estimates for the errors, so that the quantities A and 6 themselves are 
often called the absolute and relative errors. But we shall not do this. 

The notation x =x+A means thatt@-A<2r<7+A. 

For example, 


G = (6.672598 + 0.00085) - 10-!1N - m2/ke? , 
speed of light in vacuo c 299792458 m/s (exactly), 
Planck’s constant h (6.6260755 + 0.0000040) - 10734J - s, 
charge of an electron e = (1.60217733 + 0.00000049) - 1071°Coul, 
rest mass of an electron me = (9.1093897 + 0.0000054) - 1073! kg. 


gravitational constant 


Il 


The main indicator of the precision of a measurement is the relative error 
in approximation, usually expressed as a percent. 
Thus in the examples just given the relative errors are at most (in order): 


13-1075; 0; 6-1077; 31-1078; 6-107" 
or, as percents of the measured values, 
13-10%; 0%; 6-10°°%; 31-10°°%; 6-105% . 


We now estimate the errors that arise in arithmetic operations with ap- 
proximate quantities. 
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Proposition. If 
lz - ž| = A(z), ly -gl =A), 
then 


A(ž + 9) := |(£ +y) — (+ 9)| < A(ž) + AY), (2.1) 
A(Z- 9) := |z -y -2 -g| < |z|A(y) + JA) + A) Ag); (2.2) 


if, in addition, 


A 
E J#0 and di) = FP <1, 
i i Jora 
x t £ TIA(Y) +y Alr 

Alo) S452) eE E =. 2.3 

(=) y J y? 1 — ô(ğ) (eia) 


Proof. Let x = č + a and y = y + 6. Then 


A(ž +9) = |(z +y) — (2 + 9)| = la + £| < lal + |8| = Ale) + A), 
A(é-y) = |zy - ž -g| = |(ž + a)(ğ + 6) - ž : ğ| = 
= |£6 + ğa + af] < |ž| |8| + [yl la| + lab] = 
= |Z|A(y) + || A(z) + A(z) - AY) 


4 (2) Š AE = Ly — yt = 
y y J yy 
_|@+e)9- G+ 4% | 1 < See ae 
y? T+ 6/y\ J? 1 — d(y) 
_ lz|A)+lglA(z) 1 
j? 1— 69) ` 


These estimates for the absolute errors imply the following estimates for 
the relative errors: 


~- m < A(z) + A) / 
ERUS (2.1) 
(&-y) < ECE) + Oy) + 6(Y) - 0) ; (2.2") 
2) AE) +6@) l 
i G < T= 5) ee) 


In practice, when working with sufficiently good approximations, we have 
A(z) - A(y) = 0, (Z) - 6(y) ~ 0, and 1 — ô(ğ) ~ 1, so that one can use the 
following simplified and useful, but formally incorrect, versions of formulas 
(2.2), (2.3), (2.2’), and (2.37): 
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A(z-y) < |Z|A(y) + |y|A(Z) , 
a(2) < Aw +46), 
Ô(T- G) < A(T) + d(H) , 
5 (=) < 5() + 6(9) . 


IA 


Formulas (2.3) and (2.3’) show that it is necessary to avoid dividing by a 
number that is near zero and also to avoid using rather crude approximations 
in which ¥ or 1 — 6(y) is small in absolute value. 

Formula (2.1’) warns against adding approximate quantities if they are 
close to each other in absolute value but opposite in sign, since then |Z + ĝl 
is close to zero. 

In all these cases, the errors may increase sharply. 

For example, suppose your height has been measured twice by some de- 
vice, and the precision of the measurement is +0.5cm. Suppose a sheet of 
paper was placed under your feet before the second measurement. It may 
nevertheless happen that the results of the measurement are as follows: 
H, = (200 + 0.5) cm and Hə = (199.8 + 0.5) cm respectively. 

It does not make sense to try to find the thickness of the paper in the 
form of the difference Hə — Hı, from which it would follow only that the 
thickness of the paper is not larger than 0.8cm. That would of course be a 
crude reflection (if indeed one could even call it a “reflection”) of the true 
situation. 

However, it is worthwhile to consider another more hopeful computational 
effect through which comparatively precise measurements can be carried out 
with crude devices. For example, if the device just used for measuring your 
height was used to measure the thickness of 1000 sheets of the same paper, 
and the result was (20 + 0.5)cm, then the thickness of one sheet of paper 
is (0.02 + 0.0005) cm, which is (0.2 + 0.005) mm, as follows from formula 
(2.1). 

That is, with an absolute error not larger than 0.005 mm, the thickness of 
one sheet is 0.2mm. The relative error in this measurement is at most 0.025 
or 2.5%. 

This idea can be developed and has been proposed, for example, as a way 
of detecting a weak periodic signal amid the larger random static usually 
called white noise. 


c. The Positional Computation System It was stated above that every 
real number can be presented as a sequence of rational approximations. We 
now recall a method, which is important when it comes to computation, for 
constructing in a uniform way a sequence of such rational approximations 
for every real number. This method leads to the positional computation sys- 
tem. 
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Lemma. If a number q > 1 is fized, then for every positive number x € R 
there exists a unique integer k € Z such that 


gl <a <g*. 


Proof. We first verify that the set of numbers of the form qf, k € N, is 
not bounded above. If it were, it would have a least upper bound s, and by 
definition of the least upper bound, there would be a natural number m € N 
such that £ < g™ < s. But then s < g™™!, so that s could not be an upper 
bound of the set. 

Since 1 < q, it follows that q™ < q” when m < n for all m,n € Z. Hence 
we have also shown that for every real number c € R there exists a natural 
number N € N such that c < q” for all n > N. 

It follows that for any £ > 0 there exists M € N such that om < £ for all 
natural numbers m > M. 

Indeed, it suffices to set c = t and N = M; then 2 <q™ when m > M. 

Thus the set of integers m € Z satisfying the inequality x < q™ for x > 0 
is bounded below. It therefore has a minimal element k, which obviously will 
be the one we are seeking, since, for this integer, g*~! < x < q’”. 

The uniqueness of such an integer k follows from the fact that if m,n € Z 
and, for example, m < n, then m < n — 1. Hence if q > 1, then g™ < q”"!. 

Indeed, it can be seen from this remark that the inequalities g”™—1<ax<q™ 
and q”—1 < x < q”, which imply q”! < x < q”, are incompatible if m Æ n. 
O 


We shall use this lemma in the following construction. Fix q > 1 and take 
an arbitrary positive number zx € R. By the lemma we find a unique number 
p € Z such that | 
P <rt. (2.4) 


Definition 10. The number p satisfying (2.4) is called the order of x in the 
base q or (when q is fixed) simply the order of z. 


By the principle of Archimedes, we find a unique natural number a, € N 

such that 
Apg? < T < Apg” +Q”. (2.5) 

Taking (2.4) into account, one can assert that a, € {1,...,q— 1}. 

All of the subsequent steps in our construction will repeat the step we are 
about to take, starting from relation (2.5). 

It follows from relation (2.5) and the principle of Archimedes that there 
exists a unique number &p—ı € {0,1,...,q — 1} such that 


pg? + Qp—iq?—* < £< Qp + pig?" +q?", (2.6) 


If we have made n such steps, obtaining the relation 
Apg? + Ap—1gP—* + +++ + Ajn Ss 
< £T <L Qpq” + Qpy—iq? + 5 Ann” =n qP’, 
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then by the principle of Archimedes there exists a unique number @p-n-1 € 
{0,1,...,q—1} such that 
Ang” Sse Open F peg ayQe = 
T < Ang” a Ap—ng?” T Qp-n-14? 7"! Ae gee. 

Thus we have exhibited an algorithm by means of which a sequence of 
numbers Qp, Ap—1,---;Q@p—n,--- from the set {0,1,...,q—1} is placed in cor- 
respondence with the positive number x. Less formally, we have constructed 
a sequence of rational numbers of the special form 


Tn = apg? +---+Qp-nqg? ”, (2.7) 


and such that 


lm <2 < tat (2.8) 


gue 


In other words, we construct better and better appproximations from 
below and from above to the number x using the special sequence (2.7). The 
symbol ap...Qp—-n... is a code for the entire sequence {rn}. To recover the 
sequence {rn } from this symbol it is necessary to indicate the value of p, the 
order of zx. 

For p > 0 it is customary to place a period or comma after ag; for p < 0, 
the convention is to place |p| zeros left of a, and a period or comma right of 
the leftmost zero (we recall that a, # 0). 

For example, when q = 10, 


123.45 := 1-10? +2 - 10! +3- 10? +4- 107! +5- 1072, 
0.00123 := 1 - 1078 + 2 . 1074 +3 -1075 ; 


and when q = 2, 
1000.001 := 1 -29 +1.27. 


Thus the value of a digit in the symbol &p . . . &p-n . - - depends on the position 
it occupies relative to the period or comma. 

With this convention, the symbol ap . . . œo. . . . makes it possible to recover 
the whole sequence of approximations. 

It can be seen by inequalities (2.8) (verify this!) that different sequences 
{rn} and {r}, and therefore different symbols ap...ao.... and a, ...g-- +, 
correspond to different numbers z and z’. 

We now answer the question whether some real number x € R corresponds 
to every symbol ap...ao..... The answer turns out to be negative. 

We remark that by virtue of the algorithm just described for obtaining 
the numbers a@p_» € {0,1,...,q — 1} successively, it cannot happen that all 
these numbers from some point on are equal to q — 1. 

Indeed, if 


Tn = Opg? + +++ peg?” + (q—1)g?-* +--+ (a 1)” 
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for all n > k, that is, 


then by (2.8) we have 


1 
gk-P g q7? 7 qF? ` 


Then for any n> k 


0 1 
SIRE a 7S Gap? 


which, as we know from 8° above, is impossible. 
It is also useful to note that if at least one of the numbers 


Ap—k—1,+--,Qp—n is less than q — 1, then instead of (2.9) we can write 
1 1 
ta =< Tor ER TEE ge -a gr-P 
or, what is the same 
1 1 
tn + azp pS E p > (2.10) 


We can now prove that any symbol a,,...a@p.... composed of the numbers 
ay, € {0,1,...,q — 1}, and in which there are numbers different from q — 1 
with arbitrarily large indices, corresponds to some number z > 0. 

Indeed, from the symbol ay... Q@p-n... let us construct the sequence {rpn } 
of the form (2.7). By virtue of the relations ro < rı < Tn < ---, taking account 
of (2.9) and (2.10), we have 


1 
qh? 
The strict inequalities in this last relation should be understood as follows: 
every element of the left-hand sequence is less than every element of the right- 


hand sequence. This follows from (2.10). 
If we now take x = “un Pas = int (Tn age P))), then the sequence 


TO ITIL tt See ST Qt 


Le < 


1 
I—p S rot zp . (2.11) 


{rn } will satisfy sonditions (2. 7) and (2. 8) that is, the symbol a,...Qp—-n... 
corresponds to the number z € R. 

Thus, we have established a one-to-one correspondence between the pos- 
itive numbers x € R and symbols of the form ap...ao,... if p > 0 or 
0,0...0a,... if p < 0. The symbol assigned to x is called the q-ary rep- 
— ee 


|p| zeros 
resentation of x; the numbers that occur in the symbol are called its digits, 
and the position of a digit relative to the period is called its rank. 

We agree to assign to a number x < 0 the symbol for the positive number 
—zx, prefixed by a negative sign. Finally, we assign the symbol 0.0...0... to 
the number 0. 
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In this way we have constructed the positional q-ary system of writing 
real numbers. 

The most useful systems are the decimal system (in common use) and for 
technical reasons the binary system (in electronic computers). Less common, 
but also used in some parts of computer engineering are the ternary and octal 
systems. 

Formulas (2.7) and (2.8) show that if only a finite number of digits 
are retained in the q-ary expression of x (or, if we wish, we may say that 
the others are replaced with zeros), then the absolute error of the result- 
ing approximation (2.7) for x does not exceed one unit in the last rank re- 
tained. 

This observation makes it possible to use the formulas obtained in Para- 
graph b to estimate the errors that arise when doing arithmetic operations 
on numbers as a result of replacing the exact numbers by the corresponding 
approximate values of the form (2.7). 

This last remark also has a certain theoretical value. To be specific, if 
we identify a real number x with its q-ary expression, as was suggested in 
Paragraph b, once we have learned to perform arithmetic operations di- 
rectly on the q-ary symbols, we will have constructed a new model of the 
real numbers, seemingly of greater value from the computational point of 
view. 

The main problems that need to be solved in this direction are the fol- 
lowing: 

To two q-ary symbols it is necessary to assign a new symbol representing 
their sum. It will of course be constructed one step at a time. To be specific, 
by adding more and more precise rational approximations of the original 
numbers, we shall obtain rational approximations corresponding to their sum. 
Using the remark made above, one can show that as the precision of the 
approximations of the terms increases, we shall obtain more and more q-ary 
digits of the sum, which will then not vary under subsequent improvements 
in the approximation. 

This same problem needs to be solved with respect to multiplication. 

Another, less constructive, route for passing from rational numbers to all 
real numbers is due to Dedekind. 

Dedekind identifies a real number with a cut in the set Q of rational 
numbers, that is, a partition of Q into two disjoint sets A and B such that 
a < b for all a € A and all b € B. Under this approach to real numbers 
our axiom of completeness (continuity) becomes a well-known theorem of 
Dedekind. For that reason the axiom of completeness in the form we have 
given it is sometimes called Dedekind’s axiom. 

- To summarize, in the present section we have exhibited the most impor- 
tant classes of numbers. We have shown the fundamental role played by the 
natural and rational numbers. It has been shown how the basic properties of 
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these numbers follow from the axiom system® we have adopted. We have given 
a picture of various models of the set of real numbers. We have discussed the 
computational aspects of the theory of real numbers: estimates of the errors 
arising during arithmetical operations with approximate magnitudes, and the 
q-ary positional computation system. 


2.2.5 Problems and Exercises 


1. Using the principle of induction, show that 

a) the sum 71+---+2,, of real numbers is defined independently of the insertion 
of parentheses to specify the order of addition; 

b) the same is true of the product z1- £n; 

c) fer tee aml < fea] + + enl; 

d) |i -++an| = |xi]--+ [æn]; 


e) ((m,n EN) A(m < n)) => ((n =m) EN); 


f) (1+2)” >1+ 72 for x > —1 and n € N, equality holding only when n = 1 
or x = 0 (Bernoulli’s inequality); 


g) (a+b)” =a"+Sa™*b+ RNAV) gn— 242 4. + A +b” (Newton’s 


binomial formula); 


2. a) Verify that Z and Q are inductive sets. 
b) Give examples of inductive sets different from N, Z, Q, and R. 


3. Show that an inductive set is not bounded above. 


4. a) An inductive set is infinite (that is, equipollent with one of its subsets different 
from itself). 


b) The set En = {x € N| x < n} is finite. (We denote card En by n.) 


5. (The Euclidean algorithm) Let m,n € N and m > n. Their greatest common 
divisor (gcd (m,n) = d € N) can be found in a finite number of steps using the 
following algorithm of Euclid involving successive divisions with remainder. 


m = qn +r (rı <n), 
n = q2rı +r2 (r2<rı), 
Tı = q3r2 + T3 (r3 < r2), 


Tk-1 = qk+1rk +O. 


Then d = rę. 
b) If d = gcd (m,n), one can choose numbers p,q € Z such that pm + qn = d; 
in particular, if m and n are relatively prime, then pm + qn = 1. 


8 It was stated by Hilbert in almost the form given above at the turn of the twen- 
tieth century. See for example Hilbert, D. Foundations of Geometry, Chap. III, 
§ 13. (Translated from the second edition of Grundlagen der Geometrie, La Salle, 
Illinois: Open Court Press, 1971. This section was based on Hilbert’s article 
“Über den Zahlbegriff” in Jahresbericht der deutschen Mathematikervereinigung 
8 (1900).). 
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6. Try to give your own proof of the fundamental theorem of arithmetic (Paragraph 
a in Subsect. 2.2.2). 


7. If the product m-n of natural numbers is divisible by a prime p, that is, 
m-n =p: k, where k € N, then either m or n is divisible by p. 


8. It follows from the fundamental theorem of arithmetic that the set of prime 
numbers is infinite. 


9. Show that if the natural number n is not of the form k”, where k,m € N, then 
the equation x” = n has no rational roots. 


10. Show that the expression of a rational number in any q-ary computation system 
is periodic, that is, starting from some rank it consists of periodically repeating 
groups of digits. 


11. Let us call an irrational number a € R well approximated by rational numbers 


if for any natural numbers n, N € N there exists a rational number : such that 


1 


y— 2 


a) Construct an example of a well-approximated irrational number. 


b) Prove that a well-approximated irrational number cannot be algebraic, that 
is, it is transcendental (Liouville’s theorem).° 


12. Knowing that © :=m- n`! by definition, where m € Z and n € N, derive the 
“rules” for addition, multiplication, and division of fractions, and also the condition 
for two fractions to be equal. 


13. Verify that the rational numbers Q satisfy all the axioms for real numbers 
except the axiom of completeness. 


14. Adopting the geometric model of the set of real numbers (the real line), show 
how to construct the numbers a + b, a — b, ab, and Ẹ in this model. 


15. a) Ilustrate the axiom of completeness on the real line. 


b) Prove that the least-upper-bound principle is equivalent to the axiom of 
completeness. 


16. a) If AC BCR, then sup A < sup B and inf A > inf B. 


b) Let RD X # and RDY #Ø.Ifx < y for allz€ X and all y € Y, then 
X is bounded above, Y is bounded below, and sup X < inf Y. 


c) If the sets X,Y in b) are such that X UY = R, then sup X = inf Y. 


d) If X and Y are the sets defined in c), then either X has a maximal element 
or Y has a minimal element. (Dedekind’s theorem.) 


= e) (Continuation.) Show that Dedekind’s theorem is equivalent to the axiom of 
completeness. 


? J. Liouville (1809-1882) — French mathematician, who wrote on complex analysis, 
geometry, differential equations, number theory, and mechanics. 
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17. Let A+B be the set of numbers of the form a +b and A- B the set of numbers 
of the form a - b, where a € AC R and b € BCR. Determine whether it is always 
true that 

a) sup(A + B) = sup A + sup B, 

b) sup(A- B) = sup A - sup B. 


18. Let —A be the set of numbers of the form —a, where a € A C R. Show that 
sup(—A) = — inf A. 


19. a) Show that for n € N and a > 0 the equation z” = a has a positive root 
(denoted 7a or a!/”). 


b) Verify that fora > 0, b > 0, and n,m E N 
Vab = Ya- Vb and ¥ Wa= "Ya. 


c) (a7 )™ = (a™)x =: a™/” and al!" .a/™ = a/m, 
d) au) = qe = qum/n. 
e) Show that for all r1, r2 E€ Q 


rı 


a™.qg™ =at"? and (a)? =a", 


20. a) Show that the inclusion relation is a partial ordering relation on sets (but 
not a linear ordering!). 

b) Let A, B, and C be sets such that A C C, B C C, A\B £ Ø, and B\ A £ Ø. 
We introduce a partial ordering into this triple of sets as in a). Exhibit the maximal 
and minimal elements of the set {A,B,C}. (Pay attention to the non-uniqueness!) 


21. a) Show that, just like the set Q of rational numbers, the set Q(,/n) of numbers 
of the form a+ b,/n, where a,b € Q and n is a fixed natural number that is not the 
square of any integer, is an ordered set satisfying the principle of Archimedes but 
not the axiom of completeness. 

b) Determine which axioms for the real numbers do not hold for Q(./n) if the 
standard arithmetic operations are retained in Q(,/n) but order is defined by the 
rule (a+ b/n <a’ + b' yn) := Q <b’) vV Q =b) A (a < a'))). Will Q(./n) now 
satisfy the principle of Archimedes? 

c) Order the set P[x] of polynomials with rational or real coefficients by speci- 
fying that 

Pm(x) = ao +air+-::+amz™>O0, if am>O. 

d) Show that the set Q(x) of rational fractions 


ao + a£ +: + amr” 
bo + bix +- +bne” 


with coefficients in Q or R becomes an ordered field, but not an Archimedean 
ordered field, when the order relation Rm,n > 0 is defined to mean ambn > 0 
and the usual arithmetic operations are introduced. This means that the principle 
of Archimedes cannot be deduced from the other axioms for R without using the 
axiom of completeness. 


Rmn = 
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22. Let n € N and n > 1. In the set En = {0,1,...,m — 1} we define the sum and 
product of two elements as the remainders when the usual sum and product in R 
are divided by n. With these operations defined on it, the set En is denoted Zn. 


a) Show that if n is not a prime number, then there are nonzero numbers m, k 
in Zn such that m-k = 0. (Such numbers are called zero divisors.) This means that 
in Zn the equation a -b = c- b does not imply that a = c, even when b Æ 0. 


b) Show that if p is prime, then there are no zero divisors in Zp and Zp is a 
field. 


c) Show that, no matter what the prime p, Z, cannot be ordered in a way 
consistent with the arithmetic operations on it. 


23. Show that if R and R’ are two models of the set of real numbers and f : R > R’ 


is a mapping such that f(x +y) = f(z) + f(y) and f(x- y) = f(x) - f(y) for any 
L,Y € R, then l 


a) f(0) = 0'; 

b) f(1) = 1’ if f(x) #0’, which we shall henceforth assume; 

c) f(m) = m where m € Z and m’ € Z’, and the mapping f : Z > Z’ is 
injective and preserves the order. l 

d) (2) = T where mn EZ, n #0, m,n EZ, n #0, f(m) = m, 
f(n) =n’. Thus f : Q — Q is a bijection that preserves order. 

e) f :R— R is a bijective mapping that preserves order. 
24. On the basis of the preceding exercise and the axiom of completeness, show 
that the axiom system for the set of real numbers determines it completely up to an 
isomorphism (method of realizing it), that is, if R and R’ are two sets satisfying these 


axioms, then there exists a one-to-one correspondence f : R > R’ that preserves the 
arithmetic operations and the order: f(x +y) = f(x) + f(y), f(x-y) = f(x): f(y), 


and (x < y) + (f(z) < f(y)). 


25. A number z is represented on a computer as 
| k 
EEE. n 
r=+0 >), g’ 
n=1 


k 
where p is the order of x and M = )> % is the mantissa of the number zx 


n=1 a 
(Sai 1). 
Now a computer works only with a certain range of numbers: for q = 2 usually 
|p| < 64, and k = 35. Evalute this range in the decimal system. 


26. a) Write out the (6 x 6) multiplication table for multiplication in base 6. 


b) Using the result of a), multiply “columnwise” in the base-6 system 


(532)¢ 
x 
(145)6 


and check your work by repeating the computation in the decimal system. 
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c) Perform the “long” division 


(1301)6 | (25)6 


and check your work by repeating the computation in the decimal system. 


d) Perform the “columnwise” addition 


(4052)¢ 
(3125) 


27. Write (100)io in the binary and ternary systems. 

28. a) Show that along with the unique representation of an integer as 
(QnQ@n—-1...Q@0)3 , 

where a; € {0,1,2}, it can also be written as 


(Bn Bn-1 eee Bo)3 ) 


where 8 € {—1,0, 1}. 

b) What is the largest number of coins from which one can detect a counterfeit 
in three weighings with a pan balance, if it is known in advance only that the 
counterfeit coin differs in weight from the other coins? 


29. What is the smallest number of questions to be answered “yes” or “no” that 
one must pose in order to be sure of determining a 7-digit telephone number? 


30. a) How many different numbers can one define using 20 decimal digits (for 
example, two ranks with 10 possible digits in each)? Answer the same question for 
the binary system. Which system does a comparison of the results favor in terms 
of efficiency? 

b) Evaluate the number of different numbers one can write, having at one’s 
disposal n digits of a q-ary system. (Answer: q”/?.) 

c) Draw the graph of the function f(x) = x”/® over the set of natural-number 
values of the argument and compare the efficiency of the different systems of com- 
putation. 


2.3 Basic Lemmas Connected with the Completeness 
of the Real Numbers 


In this section we shall establish some simple useful principles, each of which 
could have been used as the axiom of completeness in our construction of the 
real numbers. t° 

We have called these principles basic lemmas in view of their extensive 
application in the proofs of a wide variety of theorems in analysis. 


10 See Problem 4 at the end of this section. 
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2.3.1 The Nested Interval Lemma (Cauchy—Cantor Principle) 


Definition 1. A function f : N > X of a natural-number argument is called 
a sequence or, more fully, a sequence of elements of X. 


The value f(n) of the function f corresponding to the number n € N is 
often denoted x, and called the nth term of the sequence. 


Definition 2. Let X1, X2,..., Xn,... be a sequence of sets. If X; D Xə D 
+ D Xn D---, that is Xn D Xn+ı for all n € N, we say the sequence is 
nested. 


Lemma. (Cauchy—Cantor). For any nested sequence 1 D Ig D-DD. 
of closed intervals, there exists a point c € R belonging to all of these intervals. 

If in addition it is known that for any € > O there is an interval I; 
whose length |I,| is less than £, then c is the unique point common to all the 
intervals. 


Proof. We begin by remarking that for any two closed intervals Im = [am, bm] 
and I, = [an, bn] of the sequence we have am < bn. For otherwise we would 
have an < bn < Gm < bm, that is, the intervals Im and I, would be mutually 
disjoint, while one of them (the one with the larger index) is contained in the 
other. 

Thus the numerical sets A = {a,,|m € N} and B = {b,|n € N} satisfy 
the hypotheses of the axiom of completeness, by virtue of which there is a 
number c € R such that am < c < bn for all am € A and all b, E B. In 
particular, a, < c < bn for all n € N. But that means that the point c 
belongs to all the intervals Jn. 

Now let cı and c2 be two points having this property. If they are different, 
say Cı < Co, then for any n € N we have an < cy < co < bn, and therefore 
0 < c&2— C1 < bn — an, so that the length of an interval in the sequence cannot 
be less than co — cı. Hence if there are intervals of arbitrarily small length in 
the sequence, their common point is unique. O 


2.3.2 The Finite Covering Lemma (Borel—Lebesgue Principle, 
or Heine—Borel Theorem) 


Definition 3. A system S = {X} of sets X is said to cover a set Y if 


Yc U X, (that is, if every element y € Y belongs to at least one of the 
XES 
sets X in the system S). 


A subset of a set S = {X} that is a system of sets will be called a 
subsystem of S. Thus a subsystem of a system of sets is itself a system of sets 
of the same type. 
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Lemma. (Borel—Lebesgue).!! Every system of open intervals covering a 
closed interval contains a finite subsystem that covers the closed interval. 


Proof. Let S = {U} be a system of open intervals U that cover the closed 
interval [a,b] = I. If the interval J; could not be covered by a finite set of 
intervals of the system S, then, dividing J; into two halves, we would find 
that at least one of the two halves, which we denote by I2, does not admit 
a finite covering. We now repeat this procedure with the interval I2, and so 
on. 

In this way a nested sequence Jı D Ig D--- D In D --- of closed intervals 
arises, none of which admit a covering by a finite subsystem of S. Since 
the length of the interval In is |I,| = |Z| - 27”, the sequence {In} contains 
intervals of arbitrarily small length (see the lemma in Paragraph c of Subsect. 
2.2.4). But the nested interval theorem implies that there exists a point c 
belonging to all of the intervals In, n € N. Since c € Jı = [a,b] there exists 
an open interval Ja, 6[= U € S containing c, that is, a < c < 8. Let € = 
min{c—a, 3—c}. In the sequence just constructed, we find an interval [,, such 
that |I,| < £. Since c € I, and |I,| < £, we conclude that In C U =Ja, Gl. 
But this contradicts the fact that the interval I„ cannot be covered by a finite 
set of intervals from the system. O 


2.3.3 The Limit Point Lemma (Bolzano—Weierstrass Principle) 


We recall that we have defined a neighborhood of a point x € R to be an open 
interval containing the point and the d-neighborhood about x to be the open 
interval |x — ô, x + ôl. 


Definition 4. A point p € R is a limit point of the set X C R if every 
neighborhood of the point contains an infinite subset of X. 


This condition is obviously equivalent to the assertion that every neigh- 
borhood of p contains at least one point of X different from p itself. (Verify 
this!) 

We now give some examples. 


If X = {+ € R|n € N}, the only limit point of X is the point 0 € R. 


For an open interval |a, b| every point of the closed interval [a,b] is a limit 
point, and there are no others. 


For the set Q of rational numbers every point of R is a limit point; for, as 
we know, every open interval of the real numbers contains rational numbers. 


11 É, Borel (1871-1956) and H. Lebesgue (1875-1941) — well-known French mathe- 
maticians who worked in the theory of functions. : 
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Lemma. (Bolzano—Weierstrass).!? Every bounded infinite set of real numbers 
has at least one limit point. 


Proof. Let X be the given subset of R. It follows from the definition of bound- 
edness that X is contained in some closed interval J C R. We shall show that 
at least one point of J is a limit point of X. 

If such were not the case, then each point x € J would have a neighbor- 
hood U(x) containing either no points of X or at most a finite number. The 
totality of such neighborhoods {U (x)} constructed for the points x € I forms 
a covering of J by open intervals U(x). By the finite covering lemma we can 
extract a system U (z1),...,U(£n) of open intervals that cover J. But, since 
X CI, this same system also covers X. However, there are only finitely many 
points of X in U(z;), and hence only finitely many in their union. That is, 
X is a finite set. This contradiction completes the proof. O 


2.3.4 Problems and Exercises 


1. Show that 


a) if I is any system of nested closed intervals, then 
sup {a E R| [a,b] € r} =6< 3= inf {b € RI [a,b] € r} 


and 


[a, 6] = N [a,b] ; 


[a,b] EI 


b) if J is a system of nested open intervals Ja, b| the intersection () ļa,b[| may 
]a,b[EI 


happen to be empty. 
Hint: Jan, bn [= Jo, = |: 


2. Show that 


a) from a system of closed intervals covering a closed interval it is not always 
possible to choose a finite subsystem covering the interval; 


b) from a system of open intervals covering an open interval it is not always 
possible to choose a finite subsystem covering the interval; 


c) from a system of closed intervals covering an open interval it is not always 
possible to choose a finite subsystem covering the interval. 


3. Show that if we take only the set Q of rational numbers instead of the complete 
set R of real numbers, taking a closed interval, open interval, and neighborhood of 
a point r € Q to mean respectively the corresponding subsets of Q, then none of 
the three lemmas proved above remains true. 


12 B. Bolzano (1781-1848) — Czech mathematician and philosopher. 
K. Weierstrass (1815-1897) — German mathematician who devoted a great deal 
of attention to the logical foundations of mathematical analysis. 
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4. Show that we obtain an axiom system equivalent to the one already given if we 
take as the axiom of completeness 

a) the Bolzano—Weierstrass principle 

or 

b) the Borel—Lebesgue principle (Heine—Borel theorem). 

Hint: The principle of Archimedes and the axiom of completeness in the earlier 
form both follow from a). 

c) Replacing the axiom of completeness by the Cauchy—Cantor principle leads 
to a system of axioms that becomes equivalent to the original system if we also 
postulate the principle of Archimedes. (See Problem 21 in Subsect. 2.2.2.) 


2.4 Countable and Uncountable Sets 


We now make a small addition to the information about sets that was pro- 
vided in Chap. 1. This addition will be useful below. 


2.4.1 Countable Sets 


Definition 1. A set X is countable if it is equipollent with the set N of 
natural numbers, that is, card X = card N. 


Proposition. a) An infinite subset of a countable set is countable. 
b) The union of the sets of a finite or countable system of countable sets 
is a countable set. 


Proof. a) It suffices to verify that every infinite subset E of N is equipollent 
with N. We construct the needed bijective mapping f : N —> E as follows. 
There is a minimal element of E := E, which we assign to the number 1 € N 
and denote e; € E. The set E is infinite, and therefore Ey := E; \ € is 
nonempty. We assign the minimal element of Es to the number 2 and call it 
e2 E€ E2. We then consider E3 := E \ {e1,e2}, and so forth. Since E is an 
infinite set, this construction cannot terminate at any finite step with index 
n € N. As follows from the principle of induction, we assign in this way a 
certain number en E E to each n € N. The mapping f : N —> E is obviously 
injective. 

It remains to verify that it is surjective, that is, f(N) = E. Let e € E. 
The set {n € N|n < e} is finite, and hence the subset of it {n € E|n < e} 
is also finite. Let k be the number of elements in the latter set. Then by 
construction e = Ex. 

b) If X1,...,Xn,... is a countable system of sets and each set Xm = 
{zi ,...,2",,...} is itself countable, then since the cardinality of the set X = 


U Xn, which consists of the elements x”, where m,n € N, is not less than 
neN 
the cardinality of each of the sets Xm, it follows that X is an infinite set. 
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The element x7, E Xm can be identified with the pair (m,n) of natural 
numbers that defines it. Then the cardinality of X cannot be greater than the 
cardinality of the set of all such odered pairs. But the mapping f : NxN > N 
given by the formula (m,n) > (min\minth) +m, as one can easily verify, 
is bijective. (It has a visualizable meaning: we are enumerating the points of 
the plane with coordinates (m,n) by successively passing from points of one 
diagonal on which m + n is constant to the points of the next such diagonal, 
where the sum is one larger. ) 

Thus the set of ordered pairs (m,n) of natural numbers is countable. But 
then card X < card N, and since X is an infinite set we conclude on the basis 


of a) that card X = card N. O 


It follows from the proposition just proved that any subset of a countable 
set is either finite or countable. If it is known that a set is either finite 
or countable, we say it is at most countable. (An equivalent expression is 
card X < card N.) 

We can now assert, in particular, that the union of an at most countable 
family of at most countable sets is at most countable. 


Corollaries 1) card Z = card N. 


2) card N? = card N. 
(This result means that the direct product of countable sets is countable.) 


3) card Q = card N, that is, the set of rational numbers is countable. 


Proof. A rational number * is defined by an ordered pair (m,n) of integers. 
Two pairs (m,n) and (m’,n’) define the same rational number if and only if 
they are proportional. Thus, choosing as the unique pair representing each 
rational number the pair (m,n) with the smallest possible positive integer 
denominator n € N, we find that the set Q is equipollent to some infinite 
subset of the set Z x Z. But card Z? = card N and hence card Q = card N. O 


4) The set of algebraic numbers is countable. 


Proof. We remark first of all that the equality Q x Q = card N implies, by 
induction, that card Q% = card N for every k € N. 
An element r € Q* is an ordered set (r1,..., rp) of k rational numbers. 
An algebraic equation of degree k with rational coefficients can be written 
in the reduced form zë + riz! +---+rp = 0, where the leading coefficient 
is 1. Thus there are as many different algebraic equations of degree k as there 
are different ordered sets (r1,...,17%) of rational numbers, that is, a countable 
set. 
~The algebraic equations with rational coefficients (of arbitrary degree) 
also form a countable set, being a countable union (over degrees) of countable 
sets. Each such equation has only a finite number of roots. Hence the set of 
algebraic numbers is at most countable. But it is infinite, and hence countable. 
O 
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2.4.2 The Cardinality of the Continuum 


Definition 2. The set R of real numbers is also called the number contin- 
uum,!3 and its cardinality the cardinality of the continuum. 


Theorem.(Cantor). card N < cardR. 


This theorem asserts that the infinite set R has cardinality greater than 
that of the infinite set N. 


Proof. We shall show that even the closed interval [0,1] is an uncountable 
set. 

Assume that it is countable, that is, can be written as a sequence 
£1,22,...,2n,.... Take the point zı and on the interval [0,1] = Io fix a 
closed interval of positive length J; not containing the point xı. In the in- 
terval J; construct an interval Jj not containing x2. If the interval J, has 
been constructed, then, since |J,,| > 0, we construct in it an interval In41 
so that %n41 É In4ı and |Inii| > 0. By the nested set lemma, there is a 
point c belonging to all of the intervals Ip,,...,Jn,.... But this point of 
the closed interval Jo = [0,1] by construction cannot be any point of the 
sequence %1, L2,..., Zn; O 


Corollaries 1) Q Æ R, and so irrational numbers exist. 


2) There exist transcendental numbers, since the set of algebraic numbers is 
countable. 

(After solving Exercise 3 below, the reader will no doubt wish to reinter- 
pret this last proposition, stating it as follows: Algebraic numbers are occa- 
sionally encountered among the real numbers.) 

At the very dawn of set theory the question arose whether there exist 
sets of cardinality between countable sets and sets having cardinality of the 
continuum, and the conjecture was made, known as the continuum hypothesis, 
that there are no intermediate cardinalities. 

The question turned out to involve the deepest parts of the foundations of 
mathematics. It was definitively answered in 1963 by the contemporary Amer- 
ican mathematician P. Cohen. Cohen proved that the continuum hypothesis 
is undecidable by showing that neither the hypothesis nor its negation con- 
tradicts the standard axiom system of set theory, so that the continuum 
hypothesis can be neither proved nor disproved within that axiom system. 
This situation is very similar to the way in which Euclid’s fifth postulate on 
parallel lines is independent of the other axioms of geometry. 


2.4.3 Problems and Exercises 


1. Show that the set of real numbers has the same cardinality as the points of the 
interval | — 1, 1[. 


13 From the Latin continuum, meaning continuous, or solid. 
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2. Give an explicit one-to-one correspondence between 
a) the points of two open intervals; 
b) the points of two closed intervals; 
c) the points of a closed interval and the points of an open interval; 
d) the points of the closed interval [0, 1] and the set R. 


3. Show that 
a) every infinite set contains a countable subset; 


b) the set of even integers has the same cardinality as the set of all natural 
numbers. 


c) the union of an infinite set and an at most countable set has the same 
cardinality as the original infinite set; 


d) the set of irrational numbers has the cardinality of the continuum; 


e) the set of transcendental numbers has the cardinality of the continuum. 


4. Show that 


a) the set of increasing sequences of natural numbers {ni < n2 < ---} has the 
same cardinality as the set of fractions of the form 0.œi@2...; 


b) the set of all subsets of a countable set has cardinality of the continuum. 


5. Show that 


a) the set P(X) of subsets of a set X has the same cardinality as the set of all 
functions on X with values 0,1, that is, the set of mappings f : X — {0,1}; 


b) for a finite set X of n elements, card P(X) = 2”; 


c) taking account of the results of Exercises 4b) and 5a), one can write 
card P(X) = 2°%9X and, in particular, card P(N) = 274% = card R; 


d) for any set X 
card X < 2"¢* | in particular, n < 2” for any nEN. 


Hint: See Cantor’s theorem in Subsect. 1.4.1. 


6. Let X1,...,Xn be a finite system of finite sets. Show that 


card ( U x) = X card Ag 
il 


i=1 
— `. card (Xi, N Xi) + ` card (Xi, N Xi N Xiz) — 
11 <i 41 <i2<i3 


— +++ (11) eard (Xr NN Xm) , 


the summation extending over all sets of indices from 1 to m satisfying the inequal- 
ities under the summation signs. 
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7. On the closed interval [0,1] C R describe the sets of numbers x € [0,1] whose 
ternary representation x = 0.a1a2a3..., ai E {0,1,2}, has the property: 


a) Q1 Æ ji 
b) (a1 # 1) A (a2 F 1); 
c) Vi € N (ai #1) (the Cantor set). 


8. (Continuation of Exercise 7.) Show that 


a) the set of numbers x € [0,1] whose ternary representation does not contain 
l has the same cardinality as the set of all numbers whose binary representation 
has the form 0.81 G2...; 


b) the Cantor set has the same cardinality as the closed interval [0, 1]. 


3 Limits 


In discussing the various aspects of the concept of a real number we remarked 
in particular that in measuring real physical quantities we obtain sequences 
of approximate values with which one must then work. 

Such a state of affairs immediately raises at least the following three 
questions: 

1) What relation does the sequence of approximations so obtained have to 
the quantity being measured? We have in mind the mathematical aspect of- 
the question, that is, we wish to obtain an exact expression of what is meant 
in general by the expression “sequence of approximate values” and the extent 
to which such a sequence describes the value of the quantity. Is the description 
unambiguous, or can the same sequence correspond to different values of the 
measured quantity? 

2) How are operations on the approximate values connected with the 
same operations on the exact values, and how can we characterize the opera- 
tions that can legitimately be carried out by replacing the exact values with 
approximate ones? 

3) How can one determine from a sequence of numbers whether it can be a 
sequence of arbitrarily precise approximations of the values of some quantity? 

The answer to these and related questions is provided by the concept of 
the limit of a function, one of the fundamental concepts of analysis. 

We begin our discussion of the theory of limits by considering the limit 
of a function of a natural-number argument (a sequence), in view of the 
fundamental role played by these functions, as already explained, and also 
because all the basic facts of the theory of limits can actually be clearly seen 
in this simplest situation. 


3.1 The Limit of a Sequence 


3.1.1 Definitions and Examples 
We recall the following definition. 


Definition 1. A function f : N — X whose domain of definition is the set 
of natural numbers is called a sequencé. 
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The values f(n) of the function f are called the terms of the sequence. It 
is customary to denote them by a symbol for an element of the set into which 
the mapping goes, endowing each symbol with the corresponding index of the 
argument. Thus, zn := f(n). In this connection the sequence itself is denoted 
{£n}, and also written as 71, 2%2,...,2n,.... It is called a sequence in X or a 
sequence of elements of X. 

The element xn is called the nth term of the sequence. 

Throughout the next few sections we shall be considering only sequences 
f :N— R of real numbers. 


Definition 2. A number A € R is called the limit of the numerical sequence 
{£n} if for every neighborhood V(A) of A there exists an index N (depending 
on V(A)) such that all terms of the sequence having index larger than N 
belong to the neighborhood V(A). 


We shall give an expression in formal logic for this definition below, but 
we first point out another common formulation of the definition of the limit 
of a sequence. | 

A number A € R is called the limit of the sequence {x,,} if for every € > 0 
there exists an index N such that |z, — A| < e for all n > N. 

The equivalence of these two statements is easy to verify (verify it!) if we 
remark that any neighborhood V(A) of A contains some e-neighborhood of 
the point A. 

The second formulation of the definition of a limit means that no matter 
what precision € > 0 we have prescribed, there exists an index N such that 
the absolute error in approximating the number A by terms of the sequence 
{£n} is less than € as soon as n > N. 

We now write these formulations of the definition of a limit in the language 
of symbolic logic, agreeing that the expression “im Ly = A” is to mean that 


A is the limit of the sequence {£n}. Thus 
( lim Dn = A) := YV (A) IN E N Yn > N (zn € V(A)) 


and respectively 


( lim Ln = A) := Ve > 0 IN E N Yn >N (|en -A| <€). 
n— oo 


Definition 3. If lim Ln = A, we say that the sequence {zn} converges to 
n— o0 
A or tends to A and write 7, ~ A as n > œ. 
A sequence having a limit is said to be convergent. A sequence that does 
not have a limit is said to be divergent. 
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Let us consider some examples. 
Example 1. Jim 5; = 0, since |+ — 0| = + < e when n > N = [2]. 


l 1 i 1 1 : 1 
Example 2. lim = = 1, since |** — 1| = 4 < e if n > [2]. 


Example 3. lim (1 + ca) = 1, since (i + SE) — 1| = 1 < € when 


; n— o0 
n> [2]. 
Example 4. lim sinn = 0, since jean — 0| < 4 < Efor n > [+]. 
n OO : 


Example 5. lim + =0 if |q| > 1. 
n—oo l 


Let us verify this last assertion using the definition of the limit. As was 
shown in dee c of Subsect. 2.2.4, for every € > 0 = cea NEN. 


such that T ay < £. Since |q| > 1, we shall have 2. -0| < mig < ay < e for 
n > N, and the condition in the definition of the limit is satisfied. 


Example 6. The sequence 1 ae 314, E, 6, L, ... whose nth term is £n = nD", 


n € N, is divergent. 


Proof. Indeed, if A were the limit of this sequence, then, as follows from 
the definition of limit, any neighborhood of A would contain all but a finite 
number of terms of the sequence. 

A number A ¥ 0 cannot be the limit > this ee for if € = eu > 0, 


all the terms of the sequence of the form 5 a +7 for which zpr aa a ll lie outside 
the -neighborhood of A. 

But the number 0 also cannot be the limit, since, for example, there are 
infinitely many terms of the sequence lying outside the 1-neighborhood of 0. 
0O 


=- Example 7. One can verify similarly that the sequence 1, —1, +1, —1,..., for 
which zn = (—1)”, has no limit. 


3.1.2 Properties of the Limit of a Sequence 


a. General Properties We assign to this group the properties possessed 
not only by numerical sequences, but by other kinds of sequences as well, as 
we shall see below, although at present we shall study these properties only 
for numerical sequences. 

A sequence assuming only one value will be called a constant sequence. 


`- 


l We recall that [x] is the integer part of the number z. (See Corollaries 7° and 
10° of Sect. 2.2.) 
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Definition 4. If there exists a number A and an index N such that zn = A 
for all n > N, the sequence {z,,} will be called ultimately constant. 


Definition 5. A sequence {zn} is bounded if there exists M such that 
\2n|< M for all n € N. 


Theorem 1. a) An ultimately constant sequence converges. 

b) Any neighborhood of the limit of a sequence contains all but a finite 
number of terms of the sequence. 

c) A convergent sequence cannot have two different limits. 

d) A convergent sequence is bounded. 


Proof. a) If t, = A for n > N, then for any neighborhood V(A) of A we 
have £n € V(A) when n > N, that is, lim zn = A. 
; n—-oo 
b) This assertion follows immediately from the definition of a convergent 
sequence. 

c) This is the most important part of the theorem. Let lim Tn = Ái 
and lim £n = Ag. If Ay A Ag, we fix nonintersecting aeighborhoods V(A1) 
and V(Ap) of A; and A2. These nelenborodds might be, for example, the 
d-neighborhoods of A; and Ag for ô < 5 l| A; — Ag|. By definition of limit we 
find indices N; and Ne such that x, € V (A1) for all n > N; and zn € V (A2) 
for all n > No. But then for N = max{ N1, No} we have zn € V (41) NV (Az). 
But this is impossible, since V (A1) N V(A2) = 

d) Let lim Tn = A. Setting £ = 1 in the definition of a limit, we find N 
n— o0 
such that |x, — A| < 1 for all n > N. Then for n > N we have |zn| < |A| +1. 
If we now take M > max{|z1l,...,|£nl, |A| + 1} we find that |z,| < M for 
alneN. O 


b. Passage to the Limit and the Arithmetic Operations 


Definition 6. If {zn} and {yn} are two numerical sequences, their sum, 
product, and quotient (in accordance with the general definition of sum, prod- 
uct, and quotient of functions) are the sequences 


Tn 
Yn 
The quotient, of course, is defined only when yn 4 0 for all n € N. 


Theorem 2. Let {xn} and {yn} be numerical sequences. If lim Tn = Á and 
lim yn = B, then 
Ta) lim (tq + Yn) = A+ B; 

b) Jim (tn Yn) = A- B; 

c) lim = 4, provided yn #0 (n=1,2,...,) and B #0. 


n—oo Yn 
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Proof. As an exercise we use the estimates for the absolute errors that arise 
under arithmetic operations with approximate values of quantities, which we 
already know (Subsect. 2.2.4). 

Set |A — £n| = A(zn), |B — yn| = A(yn). Then for case a) we have 


(A + B) — (@n + Yn)| < Alen) + Ayn) - 


Suppose € > 0 is given. Since Jim. Ln = A, there exists N’ such that 
A(zn) < €/2 for all n > N’. Seine since lim Yn = B, there exists N” 


such that A(yn) < €/2 for all n > N”. Then for > max{N’, N” } we shall 
have 
(A+ B) — (an+ yn)| <€, 


which, by definition of limit, proves assertion a). 
b) We know that 


Given € > 0 find numbers N’ and N” such that 


Vn > N’ (Aten) < min {1, wean) 


Yn > N” (At) < min {1, a} i 


Then for n > N = max{N’, N” } we shall have 
|En] < |A| + A(zn) < |A| +1, 
[Yn] < |B| + Alyn) < |B| +1, 
A(an)- Alyn) < min {1, =} nin (1, ae a 


Hence for n > N we have 


itnlA(yn) < (IAI +1) aar < 
[ynl A(n) < (|B| + 1) ; TEST < 


A(zxn) i Alyn) < 


wim wlMmM wim 


and therefore |AB — tnyn| < € for n > N. 
c) We use the estimate 


< enlAyn) + [ynl Aen)! 1 
7 y2 | I= Ô(Yn) i 


m Yn 


A(yn) 


where 6(Yn) = PT: 
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For a given € > 0 we find numbers N’ and N” such that 


Yn > N’ (Alen) < min {1, =} i 


j _ [|B] £€-B? 
Yn > N (Alon) < min { i aam} l 


Then for n > max{N’, N” } we shall have 


|tn| < |A| + Azn) < |A| +1, 


B 
n| > IBI- A(yn) > |B| - 2! > iel 
12 
[Yn] |B|’ 

A(yn) _ |B|/4_1 
0 < Ô(Yn = Lpiga n) 
= Wal | B272 
1 
L= n AA 
ô(Yn) > 5 
and therefore 
e+ B? E 
a4 Al+1 a = — 
2 e|Bl e 
Alo eed ia ee 
z -| (2n) < Tagg 
oc— <? 
1 — (yn) ' 
and consequently 
Å Tn 
— — — | <e when n > N. O 
B č ě Yn 


Remark. The statement of the theorem admits another, less constructive 
method of proof that is probably known to the reader from the high-school 
course in the rudiments of analysis. We shall mention this method when we 
discuss the limit of an arbitrary function. But here, when considering the 
limit of a sequence, we wished to call attention to the way in which bounds 
on the errors in the result of an arithmetic operations can be used to set 
permissible bounds on the errors in the values of quantities on which an 
operation is carried out. 


c. Passage to the Limit and Inequalities 


Theorem 3. a) Let {xn} and {yn} be two convergent sequences with 
lim £n = A and fim Yn = B. If A < B, then there exists an index N EN 


n— o0 


such that tn < Yn oe alln >N. 
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b) Suppose the sequences {rn}, {yn}, and {zn} are such that £n < Yn < Zn 
for alln > N EN. If the sequences {xn} and {zn} both converge to the same 
limit, then the sequence {yn} also converges to that limit. 


Proof. a) Choose a number C such that A < C < B. By definition of limit, 
we can find numbers N’ and N” such that |en — A| < C — A for all n > N’ 
and |yn — B| < B — C for all n > N”. Then for n > N = max{N’, N” } we 
shall have rn < A+C-A=C=B-(B-C)<yn. 

b) Suppose lim A lim Zn = A. Given € > 0 choose N’ and N” such 


that A — € < zn for all n = N’ and zn < Á + € for all n > N”. Then for 
n > N = max{N’, N” } we shall have A — € < £n < Yn < Zn < A + €, which 
says |yn — A| < e, that is A = lim Yn. O 


Corollary. Suppose lim Ln = A and lim Yn = B. If there exists N such 


that for alln > N we have 
a) In > Yn, then A> B; 
b) £n > Yn, then A> B; 
C) £n > B, then A> B; 
d) zn > B, then A> B. 


Proof. Arguing by contradiction, we obtain the first two assertions immedi- 
ately from part a) of the theorem. The third and fourth assertions are the 
special cases of the first two obtained when yn = B. O 


It is worth noting that strict inequality may become equality in the limit. 
For example I > 0 for all n € N, yet lim 4 = 0. 
n— oo 


3.1.3 Questions Involving the Existence of the Limit of a Sequence 


a. The Cauchy Criterion 


Definition 7. A sequence {xn} is called a fundamental or Cauchy sequence? 


if for any € > 0 there exists an index N € N such that |v, —£n| < € whenever 
n>N andm>N. 


Theorem 4. (Cauchy’s convergence criterion). A numerical sequence con- 
verges if and only if it is a Cauchy sequence. 


2 Bolzano introduced Cauchy sequences in an attempt to prove, without having 
at his disposal a precise concept of a real number, that a fundamental sequence 
converges. Cauchy gave a proof, taking the nested interval principle, which was 
later justified by Cantor, as obvious. 
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Proof. Suppose lim Zn = A. Given € > 0, we find an index N such that 


|En — A| < § for n > N. Then ifm > N and n > N, we have |Em — £n| < 
[Em — A| + |£n— A| < $ +5 = £, and we have thus verified that the sequence 
is a Cauchy sequence. 

Now let {xz} be a fundamental sequence. Given € > 0, we find an index 
N such that |, — z| < 3 when m > N and k > N. Fixing m = N, we find 
that for any k > N 


3 
but since only a finite number of terms of the sequence have indices not larger 
than N, we have shown that a fundamental sequence is bounded. 


For n € N we now set an := inf rz, and bn := sup £k. 
k>n k>n 


It is clear from these definitions that an < Gni1 < bn+1 < bn (since the 


E E 
IN— 3 <Tk<IN+7, (3.1) 


greatest lower bound does not decrease and the least upper bound does not 
increase when we pass to a smaller set). By the nested interval principle, 
there is a point A common to all of the closed intervals [an, bn]. 
Since 
An <A< by 


for any n € N and 


Qn = inf £k < £k < sup Tk = bk 
kon k>n 


for k > n, it follows that 
|A — Tk| < bn — an . (3.2) 
But it follows from Eq. (3.1) that 


E€ E€ 
tn — = < inf Tk =a, < bn = SUp £k < TN +- 
N 3 Eon k n œ~ Un a > 4N 3 


for n > N, and therefore 
b-a sE <e (3.3) 
for n > m. Comparing Eqs. (3.2) and (3.3), we find that 
|A — xz] <E, 


for any k > N, and we have proved that jim eA. O 
— 00 


Example 8. The sequence (—1)” (n = 1,2,...) has no limit, since it is not 
a Cauchy sequence. Even though this fact is obvious, we shall give a formal 
verification. The negation of the statement that {£n} is a Cauchy sequence 
is the following: 


Je > 0 YN EN In > N m >N (|tm—Zn| 2€), 
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that is, there exists € > 0 such that for any N € N two numbers n,m larger 
than N exist for which |Em — £n] > €. 

In our case it suffices to set € = 1. Then for any N € N we shall have 
EN+1 — TN+2| = |1 — (—1)| =2 o LSE, 


Example 9. Let 
zı =0, t2=0.0Q,, z3 =0.a102,... Zn =0.Q1Q02...An,... 


be a sequence of finite binary fractions in which each successive fraction is 
obtained by adjoining a 0 or a 1 to its predecessor. We shall show that such a 
sequence always converges. Let m > n. Let us estimate the difference £m —2Zn: 


tm — Fal =|ongt ** tom | S 
1\n+1 1\m+1 
ge ee = ee oe? = (5) PRA 
T 2n+l gm 1-4 Qn ` 


Thus, given € > 0, if we choose N so that se < €, we obtain the estimate 
[2m — Ln| < on < xi < € for all m > n > N, which proves that the sequence 
{£n} is a Cauchy sequence. 


Example 10. Consider the sequence {£n}, where 


1 1 
Yi=l+t+atet—.se 
2 n 


Since 
1 1 


EEOAE oee > nN -o — = — à 

n+l nt+n 2n 27 

for all n € N, the Cauchy criterion implies immediately that this sequence 
does not have a limit. 


|En vee T = 


: b. A Criterion for the Existence of the Limit 
of a Monotonic Sequence 


Definition 8. A sequence {£n} is increasing if tn < %n41 for all n € N, 
nondecreasing if tn < £n+1 for all n € N, nonincreasing if £n > n+ for all 
n € N, and decreasing if £n > Xn41 for all n € N. Sequences of these four 
types are called monotonic sequences. 


Definition 9. A sequence {xn} is bounded above if there exists a number M 
such that zn < M forall n EN. 


Theorem 5. (Weierstrass). In order for a nondecreasing sequence to have a 
limit it is necessary and sufficient that it be bounded above. 
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Proof. The fact that any convergent sequence is bounded was proved above 
under general properties of the limit of a sequence. For that reason only the 
sufficiency assertion is of interest. 

By hypothesis the set of values of the sequence {zn} is bounded above 


and hence has a least upper bound s = sup Zn. 
neN 
By definition of the least upper bound, for every € > 0 there exists an 


element ry € {£n} such that s — e < xy < s. Since the sequence {£n} is 
nondecreasing, we now find that s — €E < zy < £n < s for all n > N. That 
is, |s — £n| = S — £n < €. Thus we have proved that lim zn =s. O 

n— Co 


Of course an analogous theorem can be stated and proved for a nonin- 

creasing sequence that is bounded below. In this case lim £n = inf £n. 

n— oo neN 

Remark. The boundedness from above (resp. below) of a nondecreasing 

(resp. nonincreasing) sequence is obviously equivalent to the boundedness of 
that sequence. 


Let us consider some useful examples. 


Example 11. lim 4 =Oifq>1. 


noo 4 


Proof. Indeed, if x, = , then rn41 = 2 zn for n € N. Since lim 24 = 
1 ng n= "4 


Jim. (1 + zla = tim (1 + 1) -lim g =l: 7 = a < 1, there exists an index 
N such that 2H < 1 for n > N. Thus we shall have £n41 < £n for n > N, 
so that the sequence will be monotonically decreasing from index N on. As 
one can see from the definition of a limit, a finite set of terms of a sequence 
has no effect the convergence of a sequence or its limit, so that it now suffices 
to find the limit of the sequence zty+1 > EN42 >... 

The terms of this sequence are positive, that is, the sequence is bounded 
below. Therefore it has a limit. 


Let x = lim zn. It now follows from the relation zn41 = “tz, that 
n— oo nq 


i ; n+1 l l 1 
x= lim (zn41) = lim ( Zn) = lim - lim tn = =z, 


from which we find (1 — t= 0,andsoxr=0. O 


Corollary 1. 


lim Yn=1. 
n—- co 
Proof. By what was just proved, for a given € > 0 there exists N € N such 
that 1 <n < (1+e)” for alln > N. Then for n > N we obtain 1 < %/n < 1+¢ 
and hence lim Yn=1. O 
n—- co 
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Corollary 2. 
lim Ya=1 for anya >Q. 
n— oo 


Proof. Assume first that a > 1. For any € > 0 there exists N € N such that 
1<a< (1+e)” for all n > N, and we then have 1 < Ya < 1 +e for all 
n > N, which says lim Ya = 1. 

n— o0 


For 0 <a < 1, we have 1 < L, and then 


1 1 
lim a= lim Ss eS. 
n—-Co 


irae #1 lim aft 
8 n= V @ 
Example 12. lim g = 0; here q is any real number, n € N, and n! := 
n —> CO g 
120 


Proof. If q = 0, the assertion is obvious. Further, since [5] = lal- , it suffices 
to prove Ue assertion for g > 0. Reasoning as in Example 11, we remark that 
In+1 = z A —4— £n. Since the set of auia numbers is not Batnded above, there 
exists an index N such that 0 < = AE < 1 for all n > N. Then for n > N we 
shall have £n+1ı < £n, and since the terms of the sequence are positive, one 
can now guarantee that the limit Jim Ln = x exists. But then 


q = 
= lim naii = naa ‘lim Tn =0- t=). A 


c. The Number e 


1\7 
Example 13. Let us prove that the limit lim (1 + =) exists. 
n 


In this case the limit is a number denoicd by the letter e, after Euler. 
This number is just as central to analysis as the number 1 to arithmetic or 
m to geometry. We shall revisit it many times for a wide variety of reasons. 

We begin by verifying the following inequality, sometimes called Jakob 
Bernoulli’s inequality:° 


+a)” >1+na forn EN and a> —1. 


Proof. The assertion is true for n = 1. If it holds for n € N, then it must also 
hold for n + 1, since we then have 


(+a)! =(1+a)(1 +a)” > (1+a)(1+na) = 
=14+(n4+l)atna? >1+(n+1)a. 


3 Jakob (James) Bernoulli (1654-1705) — Swiss mathematician, a member of the 
famous Bernoulli family of scholars. He was one of the founders of the calculus 
of variations and probability theory. 


90 3 Limits 


By the principle of induction the assertion is true for all n € N. 
Incidentally, the computation shows that strict inequality holds if a Æ 0 
andn >1. O 


We now show that the sequence yn = (1+ “igj is decreasing. 


Proof. Let n > 2. Using Bernoulli’s inequality, we find that 


Yna (+z) nen n 1 ye n 
A a =(1+ 5) 52 
Yn (a (n2—1)” n+1 n?—1/ n+1 
n n 1 n 
> (1 ) > (4 =) —1 
et Le aT n+1 ase n+1 


Since the terms of the sequence are positive, the limit lim (1 + ee 
n= oo 


exists. 
But we then have 


1\n 1 n+1 1 —1 
lim (1 i =) ii (1 + =) (1 J: =) - 
n= oo n n— Oo n n 


1\nt1 1 1\n+1 
lim (1+-) - lim y= lim (1+=)" .0 
n i n 


n— oo 


Thus we make the following definition: 


Definition 10. 


d. Subsequences and Partial Limits of a Sequence 


Definition 11. If z1, £2,...,£n,... is a sequence and ny < ng < < 
Nk < ++: an increasing sequence of natural numbers, then the sequence 
Eni Zna; ---;Znps +- is called a subsequence of the sequence {zy}. 


For example, the sequence 1,3,5,... of positive odd integers in their nat- 
ural order is a subsequence of the sequence 1,2,3,.... But the sequence 
3,1,5,7,9,... it not a subsequence of this sequence. 


Lemma 1. (Bolzano—Weierstrass). Every bounded sequence of real numbers 
contains a convergent subsequence. | 


Proof. Let E be the set of values of the bounded sequence {zn}. If E is 
finite, there exists a point x € E and a sequence nı < ng < --- of indices 
such that £n; = Zn, =+- = x. The subsequence {zn } is constant and hence 
converges. 
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If E is infinite, then by the Bolzano—Weierstrass principle it has a limit 
point x. Since x is a limit point of E, one can choose nı € N such that 
|En, — £| < 1. If n, € N have been chosen so that |zn, — £| < ;, then, 
because zx is a limit point of E, there exists nk+ı € N such that Nk < Nnk41 
and |En: — 2] < gH 

Since jim, 2 = 0, the sequence £n, Ena,- --,Znp;--- SO constructed con- 


verges tox. O 


Definition 12. We shall write x, — +00 and say that the sequence {£n} 
tends to positive infinity if for each number c there exists N € N such that 
Ln > cfo alln> WN. 


Let us write this and two analogous definitions in logical notation: 


(£n > +00) := YE R IN EN Yn >N (cK< an), 
(£n > —00) := YER IN EN Yn >N (an <c), 
(£n > œ) := eE R IN EN Yn >N (e< |zal). 


In the last two cases we say that the sequence {2n} tends to negative 
infinity and tends to infinity respectively. 

We remark that a sequence may be unbounded and yet not tend to posi- 
tive infinity, negative infinity, or infinity. An example is £n = n)” 

Sequences that tend to infinity will not be considered convergent. 

It is easy to see that these definitions enable us to supplement Lemma 1, 
stating it in a slightly different form. 


Lemma 2. From each sequence of real numbers one can extract either a 
convergent subsequence or a subsequence that tends to infinity. 


Proof. The new case here occurs when the sequence {£n} is not bounded. 
Then for each k € N we can choose ny, € N such that |zn,| > kand nk < nk+i- 
We then obtain a subsequence {£n,} that tends to infinity. O 


| Let {x} be an arbitrary sequence of real numbers. If it is bounded below, 
= one can consider the sequence in = jat £k (which we have already encoun- 
>n 


tered in proving the Cauchy convergence criterion). Since in < in41 for any 
n € N, either the sequence {in} has a finite limit lim i, = l, or in > +00. 
n— Oo 


Definition 13. The number l = lim m £k is called the inferior limit of 
n—-co n 


the sequence {zk} and denoted lim £k or lim inf £k. If in 4 +00, it is said 
— 00O 
that the inferior limit of the sequence P sosie infinity, and we write 


Am £k = +00 or lim inf £k = +00. If the original sequence {z} is not 
o ae below, nan we ii have in = inf zk = —œ for all n. In that case 


we say that the inferior limit of the sae equals negative infinity and 


write lim x, = —oo or lim inf tp = =Co: 
k= 00 
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Thus, taking account of all the possibilities just enumerated, we can now 
write down briefly the definition of the inferior limit of a sequence {xz}: 


lim x, := lim inf zg |. 
kog n> k>n 


Similarly by considering the sequence sn = sup rz, we arrive at the defi- 
k>n 
nition of the superior limit of the sequence {zz}: 


Definition 14. 


Jim £k := lim sup Zz |. 
n— oo k>n 


We now give several examples: 


Example 14. x, = (—1)*, k € N: 


— = 1 im (—1) = —1 
C a d oaa 
lim z = lim supa, = lim sup(—1)* = lim 1=1. 
k= o0 n— oo k>n n— o0 k>n n— Oo 


Example 15. 2, = kÐ", k EN: 


lim kD" = lim inf kD" = lim 0=0, 


keo = n> k>n n— oo 
lim kD" = lim sup k(-)" = lim (+00) = +00. 
k-00 n> k>n n— oo 


Example 16. xk = k, k EN: 


lim k = lim inf k= lim n= +0, 
kao n> k>n n— oo 


lim k = lim supk = lim n (+00) = +00 . 
k= o0 n> k>n n> 


me Ge 
Example 17. ry = ~~, k € N: 


lim = lim inf — = lim =): 
keg n> k>n n— OOo 


— (—1)* —1)* | 
lim ( = = lim sup foe, lim =0. 
k= 00 n— oo k>n n— oo 1i T £ 1 


a ? 
Example 18. x, = —k*, k € N: 


lim (—k?) = lim inf (—k*?) = —oo. 


basés n> k>n 
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Example 19. x, = (—1)*k, k € N: 
lim (—1)fk = lim a 1)*k = Jim (— œ) = —oo, 


k= co 
lim (—1)fk = lim sup(—1)fk = lim n (+00) = +00. 
k— oo >00 k>n 


To explain the origin of the terms “superior” and “inferior” limit of a 
sequence, we make the following definition. 


Definition 15. A number (or the symbol —oo or +00) is called a partial 
limit of a sequence, if the sequence contains a subsequence converging to 
that number. 


Proposition 1. The inferior and superior limits of a bounded sequence are 
respectively the smallest and largest partial limits of the sequence.* 


Proof. Let us prove this, for example, for the inferior limit i = lim zp. 
k= oo 


What we know about the sequence i, = inf £k is that it is nondecreasing 


and that im în = i € R. For the MeT n € N, using the definition of 


the preatest Blown Sound, we choose by induction numbers kn € N such that 


in < Tk, < in ++ and kn < kn+1. Since lim in = lim (in + +) = = 1, we 
n n n— oo n— Co i 


can assert, by properties of limits, that lim x, = t. We have thus proved 
n— o0 


that 7 is a partial limit of the sequence {z+}. It is the smallest partial limit 
since for every € > 0 there exists n € N such that i — € < in, that is 
i — E< İn = Int Tp < Tk for any k >n. 

an 


The inequality i — € < x, for k > n means that no partial limit of the 
sequence can be less than 7 — £. But £ > 0 is arbitrary, and hence no partial 
limit can be less than 2. 

The proof for the superior limit is of course analogous. O 


We now remark that if a sequence is not bounded below, then one can 
select a subsequence of it tending to —oo. But in this case we also have 


lim x, = —oo, and we can make the convention that the inferior limit is 
k= oo 
once again the smallest partial limit. The superior limit may be finite; if so, 


by what has been proved it must be the largest partial limit. But it may also 
be infinite. If jim. £k = +00, then the sequence is also unbounded from above, 


and one can a a subsequence tending to +oo. Finally, if Jim LE = —OO, 


which is also possible, this means that sup £k = Sn > E that is, the 
k>n 


sequence {£n } itself tends to —oo, since Sn > Xn. Similarly, if lim x2, = +00, 
k= oo 
then £k — +00. 


4 Here we are assuming the natural relations —co < x < +00 between the symbols 
—oo, too and numbers x € R. 
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Taking account of what has just been said we deduce the following propo- 
sition. 


Proposition 1’. For any sequence, the inferior limit is the smallest of its 
partial limits and the superior limit is the largest of its partial limits. 


Corollary 3. A sequence has a limit or tends to negative or positive infinity 
if and only if its inferior and superior limits are the same. 


Proof. The cases when lim x, = lim zk = +00 or lim zk = lim £k = 
k= 00 k- 00 


k—--0o0 k—+ oo ; 
—oo have been investigated above, and so we may assume that lim zr, = 
k= 00 
lim zg = AER. Since i, = inf £k < £n < SUp Tk = Sn and by hypothesis 
k= 00 k>n k>n 


lim in = lim sn = A, we also have lim x, = A by properties of limits. O 
n— oo n—- oo l n—- Ooo 

Corollary 4. A sequence converges if and only if every subsequence of it 
converges. 


Proof. The inferior and superior limits of a subsequence lie between those of 
the sequence itself. If the sequence converges, its inferior and superior limits 
are the same, and so those of the subsequence must also be the same, proving 
that the subsequence converges. Moreover, the limit of the subsequence must 
be the same as that of the sequence itself. 

The converse assertion is obvious, since the subsequence can be chosen as 
the sequence itself. O 


Corollary 5. The Bolzano—Weierstrass Lemma in its restricted and wider 
formulations follows from Propositions 1 and 1’ respectively. 


Proof. Indeed, if the sequence {zp} is bounded, then the points i = lim a; 
l k= co 


and s = lim z are finite and, by what has been proved, are partial limits of 


the segueace: Only when 2 = s does the sequence have a unique limit point. 
When 2 < s there are at least two. 

If the sequence is unbounded on one side or the other, there exists a 
subsequence tending to the corresponding infinity. O 


Concluding Remarks We have carried out all three points of the program 
outlined at the beginning of this section (and even gone beyond it in some 
ways). We have given a precise definition of the limit of a sequence, proved 
that the limit is unique, explained the connection between the limit operation 
and the structure of the set of real numbers, and obtained a criterion for 
convergence of a sequence. 

We now study a special type of sequence that is frequently encountered 
and very useful — a series. 
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3.1.4 Elementary Facts about Series 


a. The Sum of a Series and the Cauchy Criterion for Convergence 
of a Series Let {an} be a sequence of real numbers. We recall that the sum 
q 
Ap + Gp41 +++: + aq, (p < q) is denoted by the symbol $` an. We now wish 
n=p 
to give a precise meaning to the expression a; + a2 +---+@,+---, which 
expresses the sum of all the terms of the sequence {an}. 


Definition 16. The expression a; + a2 +---+a, +--- is denoted by the 


OO 

symbol *` an and usually called a series or an infinite series (in order to 
n=1 

emphasize its difference from the sum of a finite number of terms). 


Definition 17. The elements of the sequence {an}, when regarded as ele- 


ments of the series, are called the terms of the series. The element a, is called 
the nth term. 


n 
Definition 18. The sum sn = > ax is called the partial sum of the series, 
k=1 


or, when one wishes to exhibit its index, the nth partial sum of the series.° 


Definition 19. If the sequence {sn} of partial sums of a series converges, 
we say the series is convergent. If the sequence {sn} does not have a limit, 
we say the series is divergent. 

Definition 20. The limit lim s, = s of the sequence of partial sums of the 


n— Oo 
series, if it exists, is called the sum of the series. 


It is in this sense that we shall henceforth understand the expression 


fore) 
) An FS. 
n=1 


Since convergence of a series is equivalent to convergence of its sequence of 
partial sums {sn }, applying the Cauchy convergence criterion to the sequence 
{sn} yields the following theorem. 


Theorem 6. (The Cauchy convergence criterion for a series). The series 
aj +---+a@n+--- converges if and only if for every e > 0 there exists N € N 
such that the inequalities m > n > N imply |an +---+am| <€. 


Corollary 6. If only a finite number of terms of a series are changed, the 
resulting new series will converge if the original series did and diverge if it 
diverged. 


5 Thus we are actually defining a series to be an ordered pair ({an}; {sn}) of 


n 
sequences connected by the relation (sn — ax.) for alln EN. 
k=1 
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Proof. For the proof it suffices to assume that the number N in the Cauchy 
convergence criterion is larger than the largest index among the terms that 
were altered. O | 


Corollary 7. A necessary condition for convergence of the series a, ++: + 
an +-:-- is that the terms tend to zero as n —> oo, that is, it is necessary that 


lim a, = 0. 
n> CO 


Proof. It suffices to set m = n in the Cauchy convergence criterion and use 
the definition of the limit of a sequence. O 


Here is another proof: an = Sn — Sn—1, and, given that lim s, = s, we 
n— Co 


have lim a, = lim (Sn — Sn-1) = lim sn — lim s,_; =s—s=0. 
n= oo n— Co n> oo n— o0 


Example 20. The series 1+q+q?+---+q"+-:- is often called the geometric 
series. Let us investigate its convergence. 

Since |q”| = |q|", we have |g”| > 1 when |q| > 1, and in this case the 
necessary condition for convergence is not met. 

Now suppose |q| < 1. Then 


selpai = —_—— 
and lim sn = z+, since lim q” = 0 if |q] < 1. 
n— o0 q n— o0 


OO 
Thus the series X` q”! converges if and only if |q| < 1, and in that case 
n=1 
‘ ‘ 1 
its sum 1s To" 
Example 21. The series 1 + 5 ++ A +--- is called the harmonic series, 
since each term from the second on is the harmonic mean of the two terms 
on either side of it (see Exercise 6 at the end of this section). 


The terms of the series tend to zero, but the sequence of partial sums 
1 1 
Sia Lap ae ee Seg 
2 n 
as was shown in Example 10, diverges. This means that in this case Sn — +00 
as n — oo. 
Thus the harmonic series diverges. 


Example 22. The series 1 — 1 + 1 — --- + (—1)”+! + --- diverges, as can be 
seen both from the sequence of partial sums 1,0,1,0,... and from the fact 
that the terms do not tend to zero. 

If we insert parentheses and consider the new series 


d= Derse 
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whose terms are the sums enclosed in parentheses, this new series does con- 
verge, and its sum is obviously zero. 
If we insert the parentheses in a different way and consider the series 


14+ (-1+1)+(-1+4+1)+-::-, 


the result is a convergent series with sum 1. 
If we move all the terms that are equal to —1 in the original series two 
places to the right, we obtain the series 


1+1-1+1-1+4+1-.--:-, 
we can then, by inserting parentheses, arrive at the series 
(1+1)+(-14+1)+(-14+1)+-::-, 
whose sum equals 2. 


These observations show that the usual laws for dealing with finite sums 
can in general not be extended to series. 

There is nevertheless an important type of series that can be handled ex- 
actly like finite sums, as we shall see below. These are the so-called absolutely 
convergent series. They are the ones we shall mainly work with. 


b. Absolute Convergence. The Comparison Theorem 
and its Consequences 


CoO 
Definition 21. The series ` a, is absolutely convergent if the series 
n=1 


Oo 
>, |a,| converges. 
n= 

Since |an +--+ am| < |an|+---|am|, the Cauchy convergence criterion 
implies that an absolutely convergent series converges. 

‘The converse of this statement is generally not true, that is, absolute 
convergence is a stronger requirement than mere convergence, as one can 
show by an example. 


Example 23. The series 1 — 1 + 5 — 5 +3-—3+---, whose partial sums are 
either + or 0, converges to 0. 
At the same time, the series of absolute values of its terms 


i 
T 


diverges, as follows from the Cauchy convergence criterion, just as in the case 
of the harmonic series: 


soo ene | 
n+1 n+l n+n ntn 


1 1 1 
= 2( Pereak ) > 2n- = 
n+1 n+n ntn 
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To learn how to determine whether a series converges absolutely or not, it 
suffices to learn how to investigate the convergence of series with nonnegative 
terms. The following theorem holds. 


Theorem 7. (Criterion for convergence of series of nonnegative terms). A 
series ayı +:::+an +::: whose terms are nonnegative converges if and only 
if the sequence of partial sums is bounded above. 


Proof. This follows from the definition of convergence of a series and the 
criterion for convergence of a nondecreasing sequence, which the sequence of 
partial sums is, in this case: s4 < S2 < -< Sn <. O 


This criterion implies the following simple theorem, which is very useful 
in practice. 


CoO OO 

Theorem 8. (Comparison theorem). Let X` an and 2 bn be two series 
n=1 

with nonnegative terms. If there exists an index J NEN i that an < bn for 


alln > N, then the convergence of the series 5 bn implies the convergence 
n=1 


oe) ee) ee) 
of >> Gn, and the divergence of >> an implies the divergence of X` bn. 


Proof. Since a finite number of terms has no effect on the convergence of a 
series, we can assume with no loss of generality that an < < bn for every index 


n € N. Then An "a ak < J bk = Bn. If the series 2 bn converges, then 
k=1 k=1 
the sequence {Bn}, which is nondecreasing, tends to a ‘limit B. But then 


An < Bn < B for all n € N, and so the sequence A, of partial sums of the 


series ` an is bounded. By the criterion for convergence of a series with 
n=1 


CoO 
nonnegative terms (Theorem 7), the series ` an converges. 


n=1 
The second assertion of the theorem follows from what has just been 
proved through proof by contradiction. O 


Example 24. Since ces a < 5 mane ar for n > 2, we conclude that the 


series ÞE -z and > CEST FI) Converge or diverge together. 
n=1 


But the latter s series can be summed directly, by pees that HET = 


n 
+ mas and therefore )> EEN =1- sai Hence D ACTEN = 1. Conse- 
k=1 n= 

OO r 
quently the series 2 = converges. It is interesting that 2 4 = = as will 


be proved below. 
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Example 25. It should be observed that the comparison theorem applies only 

to series with nonnegative terms. Indeed, if we set a, = —n and bn = 0, 
CoO Oo 

for example, we have a, < bn and the series $` bn converges while ` an 


n=1 n=1 
diverges. 


Corollary 8. (The Weierstrass M-test for absolute convergence). Let X` an 


n=l1 
ee) 
and X` bn be series. Suppose there exists an index N E€ N such that |an| < bn 
n=1 
for alln > N. Then a sufficient condition for absolute convergence of the 


OO oe) 
series X` an is that the series >> by converge. 
n=1 n=1 


OO 
Proof. In fact, by the comparison theorem the series $` |an| will then con- 
n= 


Oo 
verge, and that is what is meant by the absolute convergence of X` an. O 
n=1 
This important sufficiency test for absolute convergence is often stated 
briefly as follows: If the terms of a series are majorized (in absolute value) by 
the terms of a convergent numerical series, then the original series converges 
absolutely. 


co, ; 
Example 26. The series $, $y converges absolutely, since | a | < 4 and 


n2 
n=1 


OO 
the series }` -4 converges, as we saw in Example 24. 
n=1 


Corollary 9. (Cauchy’s test). Let ` an be a given series anda = 
n=1 


lim */lan|. Then the following are true: 
n— CO 


CoO 
a) ifa < 1, the series >> an converges absolutely; 
n=1 


OO 
b) ifa > 1, the series X` an diverges; 
n=1 
c) there exist both absolutely convergent and divergent series for which 
a= 1. 


Proof. a) If œ < 1, we can choose q € R such that a < q < 1. Fixing q, by 
definition of the superior limit, we find N € N such that %/|a,| < q for all 


n > N. Thus we shall have |an| < q” for n > N, and since the series ` q” 
n=l 
converges for |q| < 1, it follows from the comparison theorem or from the 


oO 
Weierstrass criterion that the series ` a, converges absolutely. 
n=1 
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b) Since a is a partial limit of the sequence { %/|a,,|} (Proposition 1), there 
exists a subsequence {a,,, } such that Jim "k/|Gn,| = a. Hence if a> 1, 


there exists K € N such that |an,| > i o all k > K, and so the necessary 


condition for convergence (a, — 0) does not hold for the series > an. It 
n=l 
therefore diverges. 


c) We already know that the series eo diverges and =a z converges 
=1 
1 — 1 n 1 = S 1 —, 
(absolutely, since |4| = 4). At the same time, lim == lim y = 1 
z nji _ 1: 1 e a 2 
ee ee ie eee ee 


Example 27. Let us investigate the values of x € R for which the series 


3 2 + (—1)”) 
converges. 


We a Qœ = lim V (24 (=1)" )* a? | = |z| lim |2 + (—1)”| = 3ļ|z]. 


Thus for |z| < 4 1 the s series converges and even sbsolutely, while for |z| > 3 
the series diveracs The case |z| = i requires separat consideration. In the 
present case that is an elementary task, since for |z| = 3 and n even (n = 2k), 
we have (2 + (-1)?*) z a = 32 (1) = =f: Therefore the series diverges, 


since it does not fulfill the necessary condition for convergence. 


An+1 
a 


n 


Corollary 10. (d’Alembert’s test). Suppose the limit lim | = Q eT- 
n— oo 


OO 
ists for the series X` an. Then, 
n=l 


Oo 
a) ifa <1, the series J` an converges absolutely; 
n=1 


OO 
b) ifa > 1, the series X` an diverges; 
n=1 
c) there exist both absolutely convergent and divergent series for which 


a= I, 


Proof. a) If œ < 1, there exists a number q such that a < q < 1. Fixing q 
and using properties of limits, we find an index N € N such that | Set | <q 
for n > N. Since a finite number of terms has no effect on the convergence 
of a series, we shall assume without loss of generality that [= | < q for all 
n €N. 


Since 
QAn+1 


an 


a1 


CaF | 


An-1 ai 


6 J. L. d'Alembert (1717-1783) — French scholar specializing in mechanics. He was 
a member of the group of philosophes who wrote the Encyclopédie. 
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CoO 
we find that |an+1| < |ai|-q”. But the series $` |a,|q” converges (its sum is 
n= 


lai|q 


CoO 
I so that the series X` a, converges absolutely. 


n=1 
b) If a > 1, then from some index N € N on we have | Se | > 1, that is, 
lan| < |@n+i|, and the condition a, — 0, which is necessary for convergence, 


© ©) 
does not hold for the series ` an. 
n=1 


obviously 


WE 


CoO 
c) As in the case of Cauchy’s test, the series 5° 1 and 4 provide 


n=1 n=1 


examples. O 


Example 28. Let us determine the values of x € R for which the series 


CoO 
pa 
Z= i 
converges. 
For x = 0 it obviously converges absolutely. 
For x #0 we have lim |“**| = lim dl — o. 
noo! Gn noo +1 


Thus, this series converges absolutely for every value of x € R. 


Finally, let us consider another special, but frequently encountered class 
of series, namely those whose terms form a monotonic sequence. For such 
series we have the following necessary and sufficient condition: 


CO 
Proposition 2. (Cauchy). If a1 > a2 >-:->0, the series X` an converges 


n=l 


if and only if the series Y` 2¥agxr = a, + 2a2 + 4a4 + 8ag +--+ converges. 
k=0 


Proof. Since 


ag < a2 <a, 
2a4 < a3 + a4 < 2a2, 
4ag < as + aş + ay + ag < 4a4, 


eoeeeeeeeeee eee ee we oo 


2” agn+1 < a2n4-1 +++ + Agnti < 2” Aon ; 


by adding these inequalities, we find 


1 

where Ay = a, +---+a,x and Sn = a1 +2a9+---+2”agn are the partial sums 
of the two series in question. The sequences {A;} and {Sn} are nondecreas- 
ing, and hence from these inequalities one can conclude that they are either 
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both bounded above or both unbounded above. Then, by the criterion for 
convergence of series with nonnegative terms, it follows that the two series 
indeed converge or diverge together. O 


This result implies a useful corollary. 


Corollary. The series pe converges for p > 1 and diverges for p < 1." 


nP 


Proof. If p > 0, the proposition implies that the series converges or diverges 
simultaneously with the series 


= k_l 1—p\k 
X = 5 (QP y 
k=0 (2°)? k=0 


and a necessary and sufficient condition for the convergence of this series is 
that q = 217? < 1, that is, p > 1. 
CoO 
If p < 0, the divergence of the series $` L is obvious, since all the terms 


n=1 
of the series are larger than 1. O 


The importance of this corollary is that the series ae —~ is often used as 
n=1 
a comparison series to study the convergence of other series. 


c. The Number e as the Sum of a Series To conclude our study of series 
we return once again to the number e and obtain a series the provides a very 
convenient way of computing it. 

We shall use Newton’s binomial formula to expand the expression 
(1 + +)", Those who are unfamiliar with this formula from high school and 
have not solved part g) of Exercise 1 in Sect. 2.2 may omit the present ap- 
pendix on the number e with no loss of continuity and return to it after 
studying Taylor’s formula, of which Newton’s binomial formula may be re- 
garded as a special case. 

We know that e = lim (1 + 1)" 

n— oo 


By Newton’s binomial formula 


1 ni n(n-1) 1 
(1+ =) D RS hive 


1! 2! 
n(n —1)---(n—k+1) 1 1 
1 1 1 2 
E TE EO 
mrtg m n n ‘ 


x 
n 
TERES] 


T Up to now in this book the number n? has been defined formally only for rational 
values of p, so that for the moment the reader is entitled to take this proposition 
as applying only to values of p for which n?” is defined. 
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Setting (1+ i)" =e, and 1+1+- 5 e 1 = Sn, we thus have 
Cn Sy (m= 1D os) 

On the other hand, for any fixed k and n > k, as can be seen from the 
same expansion, we have 


1 1 1 1 k-1 
iets (A--)+--+5 (1-=). -(1- ==) <en. 
2! n k! n n 


As n — œ the left-hand side of this inequality tends to sz, and the right- 
hand side to e. We can now conclude that są < e for all k € N. 
But then from the relations 


en < Sn Se 


we find that lim s, =e. 
n— Oo 


In accordance with the definition of the sum of a series, we can now write 


This representation of the number e is very well adapted for computation. 
Let us estimate the difference e — Sn: 


1 1 
eae aoe 
ae a ee 1 
E Pee ee Es 
(n+ 1)! n+2 (n+2)? 
1 1 n+2 1 


= a 
(n+1)! 1- 5 ni(n+1)? “nln 


Thus, in order to make the Aoo error Me the approximation of e by 
Sn less than, say 1073, it suffices that -+ I. This condition is already 
satisfied by se. 

Let us write out the first few decimal digits of e: 


e = 2.7182818284590... . 
This estimate of the difference e — sn can be written as the equality 
On 
eS salt 3 where 0 < 0, <1. 
nin 
It follows immediately from this representation of e that it is irrational. 


Inded, if we assume that e = a where p,q € N, then the number q!e must be 
an integer, while i 
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0 q! q! qi) 0 
emg EC E ne E E iG A. 
ge=q(s +) =d +i +i + eT 


and then the number 2 would have to be an integer, which is impossible. 


For the reader’s information we note that e is not only irrational, but also 
transcendental. 


3.1.5 Problems and Exercises 


1. Show that a number z € R is rational if and only if its g-ary expression in any 
base q is periodic, that is, from some rank on it consists of periodically repeating 
digits. 

2. A ball that has fallen from height h bounces to height gh, where q is a constant 
coefficient 0 < q < 1. Find the time that elapses until it comes to rest and the 
distance it travels through the air during that time. 


3. We mark all the points on a circle obtained from a fixed point by rotations of the 
circle through angles of n radians, where n € Z ranges over all integers. Describe 
all the limit points of the set so constructed. 


4. The expression 


1 
Nk-1 + — 
nk 


where nz € N, is called a finite continued fraction, and the expression 


1 
nı + ———_, — 


n2 T 
n3+ 


is called an infinite continued fraction. The fractions obtained from a continued 
fraction by omitting all its elements from a certain one on are called the convergents. 
The value assigned to an infinite continued fraction is the limit of its convergents. 
Show that: 
m 


a) Every rational number “, where m,n € N can be expanded in a unique 
manner as a continued fraction: 


m 
m a 8 
n 
q2 + —————— 
q3 + 


assuming that qn Æ 1 for n > 1. 
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Hint: The numbers q, ...,qn, called the incomplete quotients or elements, can 
be obtained from the Euclidean algorithm 


=Sn-qgatni, 
n = T1'GQ2+T2, 
rı = 72°93 173, 
by writing it in the form 
m it 1 ak 
— =q = qı 
n n/r q2+ 


1 , ; bask 
b) The convergents Ri = qi, R2 = qi + —,. . . satisfy the inequalities 
q2 


m 
Ri < Bg < ++ < Hapa < 7 < Rok < Rok-2 < +++ < Ra. 


c) The numerators P, and denominators Q, of the convergents are formed 
according to the following rule: 


Pk = Pe-igdk + Pe-2, PR =q, A=n, 


Qk = Qr-19k + Qe-2, Q2=GQ2, Qi=l. 


d) The difference of successive convergents can be computed from the formula 


_ Ae 
Ry — Rk-1 = Oo (kK>1). 


e) Every infinite continued fraction has a determinate value. 
f) The value of an infinite continued fraction is irrational. 


g) 
1 
E N PEE 
2 1+ —— 


P< 


h) The Fibonacci numbers 1, 1,2,3,5,8,... (that is, un = Un—1 + Un—2 and 
U1 = U2 = 1), which are obtained as the denominators of the convergents in g), are 
given by the formula 


w= alr) -C°) | 


Qk Q2 V5’ 
Compare this result with the assertions of Exercise 11 in Sect. 2.2. 


i) The convergents Ry, = at in g) are such that | 
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5. Show that 
a) the equality 
1 1 1 1 1 | 1 


je epee tice ed Orr wre AEE alan en pay ee eee Pee ae 
ru a" a oan 1-2-2! (n—1)-n-n! 


holds for n > 2; 


co 
= 1 
b) e = 3 — 2 (n+1)(n+2)(n+2)!? 


c) Tor computing the number e approximately the formula e ~ 1+ 4 + 4 + 
“+34 is much better than the original formulae x 1+ 4+ 4+°°°+ =. 


aln n! 
(Estimate the errors, and compare the result with the value of e given on p. 103). 


6. If a and b are positive numbers and p an arbitrary nonzero real number, then 
the mean of order p of the numbers a and b is the quantity 


1 
a? +b \P 
S,(a,8) = (E) 


In particular for p = 1 we obtain the arithmetic mean of a and b, for p = 2 their 
square-mean, and for p = —1 their harmonic mean. 


a) Show that the mean S,(a, b) of any order lies between the numbers a and b. 


b) Find the limits of the sequences 


f Sala, b)}, fS-n(a, b) } | 


T 


7. Show that if a > 0, the sequence £n+ı = (2n + =) converges to the square 


root of a for any x; > 0. 
Estimate the rate of convergence, that is, the magnitude of the absolute error 
|En — va| = |n] as a function of n. 


8. Show that 


a) So(n) = 1°9+---4+n° =n, 


e Hk nnti) La 
Si(n) = 1 +---4+n = 5 =35 t 52, 
1)(2 1 i 1 

So(n) = 17 +. +n? = rin + Gn FD) B zn E en, 

2 2 

n (n + 1) Oe eee eee ee 
PS ge ge te ge 

and in general that 
k+1 


Sk(n) = Ak+1N +---+ain+ao 


is a polynomial in n of degree k + 1. 


im Ska) — 
b) de REFE = BHT. 
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3.2 The Limit of a Function 


3.2.1 Definitions and Examples 


Let E be a subset of R and a a limit point of E. Let f: E > R bea 
real-valued function defined on E. 

We wish to write out what it means to say that the value f(x) of the 
function f approaches some number A as the point x € E approaches a. It 
is natural to call such a number A the limit of the values of the function f, 
or the limit of f as x tends to a. 


Definition 1. We shall say (following Cauchy) that the function f : E + R 
tends to A as x tends to a, or that A is the limit of f as x tends to a, if for 
every € > 0 there exists 6 > 0 such that |f (x) — A| < e for every x € E such 
that 0 < |x — al < ô. 


In logical symbolism these conditions are written as 
Ve > 0 36 > 0 Yx € E (0< |z -a| < ô = |f(z) -— Al <€). 


If A is the limit of f(x) as x tends to a in the set E, we write f(x) > A as 
zra, xE E,or lim _ f(x) = A. Instead of the expression z > a, x € E, 
1—4, T 


we shall as a rule use the shorter notation E 3 x — a, and instead of 
lim _ f(x) we shall write lim f(x) = A. 
za,xrek E>z2-a 


Example 1. Let E = R \ 0, and f(x) = sin +. We shall verify that 


oe! 
lim zsin-—-=0O. 
E>x—>0 4 & 


Indeed, for a given € > 0 we choose ô = e. Then for 0 < |z| < 6 = e, 
| taking account of the inequality |x sin +| < |z|, we shall have |æ sin 1| <E. 


Incidentally, one can see from this example that a function f : E —> R 
may have a limit as E 5 x — a without even being defined at the point a 
itself. This is exactly the situation that most often arises when limits must 
be computed; and, if you were paying attention, you may have noticed that 
this circumstance is taken into account in our definition of limit, where we 
wrote the strict inequality 0 < |x — al. 

We recall that a nezghborhood of a point a € R is any open interval 
containing the point. 


Definition 2. A deleted neighborhood of a point is a neighborhood of the 
point from which the point itself has been removed. 
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If U(a) denotes a neighborhood of a, we shall denote the corresponding 


deleted neighborhood by U (a). 
The sets 


Ugla) := ENU(a), 
Un(a) := ENU(a) 


will be called respectively a neighborhood of a in E and a deleted neighborhood 
ofa in E. 
If a is a limit point of E, then Ug(a) 4 Ø for every neighborhood U (a). 


If we temporarily adopt the cumbersome symbols U ô (a) and VE(A) to 
denote the deleted -neighborhood of a in E and the e-neighborhood of A in 
R, then Cauchy’s so-called “e-d-definition” of the limit of a function can be 
rewritten as 


(aim te) = A) = WE 3o) (FÖRA) < VEA) | 


This expression says that A is the limit of the function f : E — R as x 
tends to a in the set E if for every e-neighborhood V$ (A) of A there exists 


a deleted neighborhood U ĉ (a) of a in E whose image f ( U 2 (a)) under the 
mapping f : E — R is entirely contained in V$ (A). 

Taking into account that every neighborhood of a point on the real line 
contains a symmetric neighborhood (a 6-neighborhood) of the same point, 
we arrive at the following expression for the definition of a limit, which we 
shall take as our main definition: 


Definition 3. 


(, tim f(x) = A) := WVp(A) 3U x(a) ( f(Ux(a)) c Ve(A) . 


Dra 


Thus the number A is called the limit of the function f: E — R as x tends 
to a while remaining in the set E (a must be a limit point of E) if for every 
neighborhood of A there is a deleted neighborhood of a in E whose image 
under the mapping f: E — R is contained in the given neighborhood of A. 

We have given several statements of the definition of the limit of a func- 
tion. For numerical functions, when a and A belong to R, as we have seen, 
these statements are equivalent. In this connection, we note that one or an- 
other of these statements may be more convenient in different situations. For 
example, the original form is convenient in numerical computations, since it 
shows the allowable magnitude of the deviation of x from a needed to ensure 
that the deviation of f(x) from A will not exceed a specified value. But from 
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the point of view of extending the concept of a limit to more general func- 
tions the last statement the definition is most convenient. It shows that we 
can define the concept of a limit of a mapping f : X — Y provided we have 
been told what is meant by a neighborhood of a point in X and Y, that is, 
as we say, a topology is given on X and Y. 

Let us consider a few more examples that are illustrative of the main 
definition. 


Example 2. The function 


lifx>0, 
sen £ = 0Oifx=0, 
—-lifx<0 


(read “signum x”®) is defined on the whole real line. We shall show that it 
has no limit as x tends to 0. The nonexistence of this limit is expressed by 


WA € R IV(A) VU(0) Ix € U(0) (f(z) ¢ V(A)) , 
that is, no matter what A we take (claiming to be the limit of sgn x as x — 0), 


there is a neighborhood V(A) of A such that no matter how small a deleted 


neighborhood U (0) of 0 we take, that deleted neighborhood contains at least 
one point x at which the value of the function does not lie in V(A). 

Since sgn x assumes only the values —1, 0, and 1, it is clear that no number 
distinct from them can be the limit of the function. For any such number has 
a neighborhood that does not contain any of these three numbers. 

But if A € {—1,0,1} we choose as V(A) the e-neighborhood of A with 
E = 5: The points —1 and 1 certainly cannot both lie in this neighborhood. 


But, no matter what deleted neighborhood U (0) of 0 we may take, that 
neighborhood contains both positive and negative numbers, that is, points x 
where f(x) = 1 and points where f(x) = —1. 


Hence there is a point x € U (0) such that f(x) ¢ V(A). 


If the function f : E — R is defined on a whole deleted neighborhood of 
a point a € R, that is, when U gla) = Urla) =U (a), we shall agree to write 
more briefly x — a instead of E 3 z > a. 
Example 3. Let us show that lim Isgn g| = 1. 

Indeed, for x € R \ 0 we have |sgnz| = 1, that is, the function is con- 
stant and equal to 1 in any deleted neighborhood U (0) of 0. Hence for any 
neighborhood V(1) we obtain f( U(0)) =leV{(1). 


8 The Latin word for sign. 
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Note carefully that although the function |sgn z| is defined at the point 
0 itself and |sgn0| = 0, this value has no influence on the value of the limit 
in question. Thus one must not confuse the value f(a) of the function at the 
point a with the limit lim f(x) that the function has as x > a. 


Let R_ and R, be the sets of negative and positive numbers respectively. 


Example 4. We saw in Example 2 that the limit lim Senar does not exist. 
' Dr 


Remarking, however, that the restriction sgn|p_ of sgn to R_ is a constant 
function equal to —1 and sgn|r, is a constant function equal to 1, we can 
show, as in Example 3, that 


lim sgnx=-—l, and lim sgnzx=1, 
R_ 5x0 R452 0 
that is, the restrictions of the same function to different sets may have dif- 


ferent limits at the same point, or even fail to have a limit, as shown in 
Example 2. 


Example 5. Developing the idea of Example 2, one can show similarly that 
sin + has no limit as x — 0. 


Indeed, in any deleted neighborhood U (0) of 0 there are always points of 
the form a and a/242an? where n € N. At these points the function 
assumes the values —1 and 1 respectively. But these two numbers cannot 
both lie in the e-neighborhood V(A) of a point A € R if € < 1. Hence no 
number A €E R can be the limit of this function as x — 0. 


Example 6. If 


E_ = f E€ R| z = —_—— n EN \ 
. i =r /2 + 2rn 7 
and i 
Bis f ER i. ge a ae oO = N} ’ 
i 7 ji n/2 + 2rn í 
then, as shown in Example 4, we find that 
: . il re | 
lim sin- = -—1 and lim sin- =1. 
E-Əx—>0 T E+}5x—0 T 


There is a close connection between the concept of the limit of a sequence 
studied in the preceding section and the limit of an arbitrary numerical- 
valued function introduced in the present section, expressed by the following 
proposition. 


3.2 The Limit of a Function 111 


Proposition 1. ° The relation zim f(z) = A holds if and only if for every 
xa 


sequence {£n} of points x, € E \a converging to a, the sequence {f (tn) } 
converges to A. 


Proof. The fact that ( lim f(x) = A) = ( lim f(zn) = A) follows im- 
EƏ3x—>a n— o0 

mediately from the definitions. Indeed, if pim f(x) = A, then for any 
xta 


neighborhood V(A) of A there exists a deleted neighborhood Ugla) of the 


point a in E such that for x € Ug(a) we have f(x) € V(A). If the sequence 
{£n} of points in E \ a converges to a, there exists an index N such that 


Tn E U gla) for n > N, and then f(z,) € V(A). By definition of the limit of 
a sequence, we then conclude that lim fta] = A: 


We now prove the converse. If A is not the limit of f(x) as E 3 x > a, 
then there exists a neighborhood V(A) such that for any n € N, there is a 
point £n in the deleted +-neighborhood of a in E such that f(2n) ¢ V(A). 
But this means that the sequence { f Cay: does not converge to A, even 
though {£n} converges toa. O 


3.2.2 Properties of the Limit of a Function 


We now establish a number of properties of the limit of a function that are 
constantly being used. Many of them are analogous to the properties of the 
limit of a sequence that we have already established, and for that reason 
are essentially already known to us. Moreover, by Proposition 1 just proved, 
many properties of the limit of a function follow obviously and immediately 
from the corresponding properties of the limit of a sequence: the uniqueness 
of the limit, the arithmetic properties of the limit, and passage to the limit 
in inequalities. Nevertheless, we shall carry out all the proofs again. As will 
be seen, there is some value in doing so. 

We call the reader’s attention to the fact that, in order to establish the 
properties of the limit of a function, we need only two properties of deleted 
neighborhoods of a limit point of a set: 


Bı) Un(a) + Ø, that is, the deleted neighborhood of the point in E is 
nonempty; 


B2) VU'p(a)VU"p(a)3Ux(a) (Ugla) C U'e(a) NU" g(a), 
that is, the intersection of any pair of deleted neighborhoods contains a 
deleted neighborhood. This observation leads us to a general concept of a 
limit of a function and the possibility of using the theory of limits in the 


° This proposition is sometimes called the statement of the equivalence of the 
Cauchy definition of a limit (in terms of neighborhoods) and the Heine definition 
(in terms of sequences). 

E. Heine (1821-1881) — German mathematician. 
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future not only for functions defined on sets of numbers. To keep the discus- 
sion from becoming a mere repetition of what was said in Sect. 3.1, we shall 
employ some useful new devices and concepts that were not proved in that 
section. 


a. General Properties of the Limit of a Function We begin with some 
definitions. : 


Definition 4. As before, a function f : E — R assuming only one value 
is called constant. A function f : E — R is called ultimately constant as 


(0) 
E > x —> a if it is constant in some deleted neighborhood Ug(a), where a is 
a limit point of E. 


Definition 5. A function f : E > R is bounded, bounded above, or bounded 
below respectively if there is a number C € R such that |f(x)| < C, f(x) < C, 
or C < f(x) for all x € E. 

If one of these three relations holds only in some deleted neighborhood 


U g(a), the function is said to be ultimately bounded, ultimately bounded above, 
or ultimately bounded below as E 5 x — a respectively. 

Example 7. The function f(x) = (sin + + xcos =) defined by this formula 
for x # 0 is not bounded on its domain of definition, but it is ultimately 
bounded as x —> 0. 


Example 8. The same is true for the function f(z) = xz on R. 


Theorem 1. a) ( f: E — R 1s ultimately the constant A as E > xz > a) => 


( lim f(z) = A). 


E32-a 


b) (3 lim f(z)) => (f : E >R is ultimately bounded as E > x > a). 
E5xz—>a 
c) („im f(a) = A1) A („lim f(x) = A2) => (A; = Ag). 


Proof. The assertion a) that an ultimately constant function has a limit, 
and assertion b) that a function having a limit is ultimately bounded, follow 
immediately from the corresponding definitions. We now turn to the proof of 
the uniqueness of the limit. 

Suppose A; # Ag. Choose neighborhoods V(A;) and V(Az2) having no 
points in common, that is, V(A1) A V(A2) = Ø. By definition of a limit, we 
have | 


lim f(s) = Ar = 3U'z(a) (F(U'e(a)) c V(Ar)) 


ES2-a 


ie Fes SU ee ( f(U"g(a)) c V(A2)) . 


ED>2-a 
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We now take a deleted neighborhood Un(a) of a (which is a limit point of 
E) such that Ugla ) C U'g(a)NU" g(a). (For example, we could take Un(a) = 


U ‘n(a)NU 2 g(a), since this intersection is also a deleted neighborhood.) 


Since Un(a) # Ø, we take x € Ugla). We then have f(x) € V(A1)AV (A2), 
which is impossible since the neighborhoods V (A) and V (A2) have no points 
in common. O 


b. Passage to the Limit and Arithmetic Operations 


Definition 6. If two numerical-valued functions f : E > R and g : E — R 
have a common domain of definition E, their sum, product, and quotient are 
respectively the functions defined on the same set by the following formulas: 


(f +9)(x) = f(x) + g(2) , 
(f -g)(x£) := f(x): g(x), 


f f(z). 
Sja) = , if g(x) #0 forrzek. 
( \( ) a) g(x) F 
Theorem 2. Let f: E —> Randg: E—>R be two functions with a common 


domain of definition. 
If lim f(x)=Aand lim g(x) =B, then 
ESz—-a E5z-a 


a) rim, Af +g9)(2)=A+B; 
b) A a A-B; 


c) pim (4) -5 =J if B #0 and g(x) £0 forze E. 

As already noted at the beginning of Subsect. 3.2.2, this theorem is an 
immediate consequence of the corresponding theorem on limits of sequences, 
given Proposition 1. The theorem can also be obtained by repeating the 
: proof of the theorem on the algebraic properties of the limit of a sequence. 
The changes needed in the proof in order to do this reduce to referring to 


some deleted neighborhood Un (a) of a in E, where previously we had referred 
to statements holding “from some N € N on”. We advise the reader to verify 
this. 
Here we shall obtain the theorem from its simplest special case when 
A= B = 0. Of course assertion c) will then be excluded from consideration. 
A function a : E — R is said to be infinitesimal as E 3 x —> a if 


lim f(z) = 


E2>2-a 


Proposition 2. a) Ifa: E > R and B : E > R are infinitesimal functions 
as E > x — a, then their sum a+ B : E — R is also infinitesimal as 
B3x2z-— a. 
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b) Ifa: E — R and b: E > R are infinitesimal functions as E 3 x > a, 
then their product a- 3: E — R is also infinitesimal as E > x >a. 

c) Ifa: E > R is infinitesimal as E 3 x > a and B : E —> R is ultimately 
bounded as E > x — a, then the product aœ- B : E — R is infinitesimal as 
E>zt>a. 


Proof. a) We shall verify that 


( lim a(x) = 0) A ( lim g(x) = 0) > (_ lim (a + B)(x) = 0) l 


E>’z—>a E>5x>a 


Let £ > 0 be given. By definition of the limit, we have 


( lim a(x) = 0) => (= U's(a) Va € U's (a) (la(x)| < =)) ) 


ED5>2->a 


( lim A(x) =0) > (2 (a) Yz € U"s(a) (B(2)| < =)). 


E5x—>a 
Then for the deleted neighborhood Ug(a) C U'g(a) N U'g(a) we obtain 


Vz € Ugla) |(&œ + B)(2)| = |a(x) + 2(2)| < la(z)| + |B(2)| < €, 


That is, we have verified that pim (a+ B)(x) =0. 
ra 


b) This assertion is a special case of assertion c), since every function that 
has a limit is ultimately bounded. 
c) We shall verify that 


( lim a(x) = 0) A (am € R 3Ug(a) Yz € Ugla) (|6(2)| < M)) a 


E>z2—-a 
=> ( lim a(zr)G(xr) = 0) : 


ED>x2-a 


Let £ > 0 be given. By definition of limit we have 


(tim _a(z) =0) + (AU's(a) Ve € U'z(a) (la(2)| < =)). 


ED52-a 


(0) 


Then for the deleted neighborhood U'g(a) C U'g(a) A Ug(a), we obtain 


Ve € U'b(a) |(a - B)(2)| = |a(z)A(2)| = la(@)| |B(@)| < M =e. 
Thus we have verified that pim : a(xz)B(2z)=0. O 


The following remark is very useful: 
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Remark 1. 


In other words, the function f : E — R tends to A if and only if it can 
be represented as a sum A + a(x), where a(x) is infinitesimal as E 35 x —> a. 
(The function a(x) is the deviation of f(x) from A.)!° 

This remark follows immediately from the definition of limit, by virtue of 
which 

li =A li —A)=0. 
a iad hh Bl 

We now give the proof of the theorem on the arithmetic properties of the 
limit of a function, based on this remark and the properties of infinitesimal 
functions that we have established. 


Proof. a) If lim f(x)=Aand lim g(x) = B, then f(x) = A+a(z) and 
E>xz2—-a E>z-a 


g(x) = B+ B(x), where a(x) and B(x) are infinitesimal as E 5 x — a. Then 
(f + 9)(x) = f(z) + g(x) = A+ a(x) + B+ B(x) = (A+ B) + y(x), where 
y(x) = a(x) + B(x), being the sum of two infinitesimals, is infinitesimal as 
B3>2- 4. 
Thus lim (f+g)(z)=A+B. 
E>xz—-a 


b) Again representing f(x) and g(x) in the form f(x) = A+a(z), g(x) = 
B + B(x), we have 


(f -9)(2) = f(z)g(x) = (A + a(x))(B+ B(z)) =A-B+ (2), 


where y(x) = AG(x) + Ba(x) + a(x)G(z) is infinitesimal as E 5 x > a 
because of the properties just proved for such functions. 
Thus, zum (f-g)(x) =A-B. 
za 


c) We once again write f(x) = A + a(x) and g(x) = B+ B(x), where 
lim a(z)=Oand lim 8(x)=0Q. 
E>x2-a E>z2-a 
Since B # 0, there exists a deleted neighborhood Un(a), at all points of 
which |6(z)| < 13l, and hence |g(z)| = |B + A(z)| > |B|- |G(@)| > 43.. 


Then in Un (a) we shall also have TO < Te that is, the function —, is 


g(x) 
ultimately bounded as & 3 x — a. We then write 


10 Here is a curious detail. This very obvious representation, which is nevertheless 
very useful on the computational level, was specially noted by the French mathe- 
matician and specialist in mechanics Lazare Carnot (1753-1823), a revolutionary 
general and academician, the father of Sadi Carnot (1796-1832), who in turn was 
the creator of thermodynamics. 
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Piaget -oe5 =o 
g B g(x) B B+f(x) B 
1 
qa) z (Balz) + A(z)) = r(x) 


By the properties of infinitesimals (taking account of the ultimate bounded- 
ness of OL we find that the function y(x) is infinitesimal as E&E 5 x > a. 


£ ZA 
Thus we have proved that pim, A (2) (2) = 5. O 


c. Passage to the Limit and Inequalities 


Theorem 3. a) If the functions f : E —> R and g : E — R are such that 
pim f(x) = A, and pim g(x) = B and A < B, then there exists a deleted 
xa r—a 


neighborhood Un(a) of a in E at each point of which f(x) < g(x). 
b) If the relations f(x) < g(x) < h(x) hold for the functions f : E > R, 
g: E — R, andh: E> R, and if pim Tas pim h(x) = C, then the 
xa za 


limit of g(x) exists as E 5 x > a, and lim g(x)=C. 
E5x—>a 


Proof. a) Choose a number C such that A < C < B. By definition of limit, we 
find deleted neighborhoods U 'g(a) and T (a) of ain E such that |f(x)—A| < 
C—A for x € U'p(a) and |g(x)— B| < B-C for x € Dila. Then at any 
point of a deleted neighborhood Ug(a) contained in U gla) N U 'p(a), we find 


f(z) < A+(C-A)=C=B-(B-C)<g(a). 


b) If lim f(z)= lim h(x) =C, then for any fixed € > 0 there exist 
E>z2—-a E>x2z—-a 

deleted neighborhoods U'’g(a) and U'g(a) of a in E such that C — e < f(z) 

for x € U'g(a) and h(x) < C+ € for x € U'g(a). Then at any point of a 


deleted neighborhood Un(a) contained in U’g(a) N U'g(a), we have C — e€ < 
f(x) < g(x) < h(x) < C +e, that is, |g(x) — C| < £, and consequently 
pim ga)=C. O 

za 


Corollary. Suppose pim f(z) = Á and pim 92) = B. Let Ug(a) be a 
deleted neighborhood of a in E. 

a) If f(x) > g(x) for all x € Un(a), then A> B; 

b) f(x) > g(x ) for all z € Up(a), then A> B; 

c) f(x ) > B for all x € Un(a ), then A> B; 

d) f(z) > B for all x € Ugla ), thn A>B. 
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Proof. Using proof by contradiction, we immediately obtain assertions a) and 
b) of the corollary from assertion a) of Theorem 3. Assertions c) and d) follow 
from a) and b) by taking g(x) = B. O 


d. Two Important Examples Before developing the theory of the limit of 
a function further, we shall illustrate the use of the theorems just proved by 
two important examples. 


Example 9. 


Here we shall appeal to the definition of sin x given in high school, that 
is, sin x is the ordinate of the point to which the point (1,0) moves under a 
rotation of x radians about the origin. The completeness of such a definition 
is entirely a matter of the care with which the connection between rotations 
and real numbers is established. Since the system of real numbers itself was 
not described in sufficient detail in high school, one may consider that we 
need to sharpen the definition of sin x (and the same is true of cos zx). 

We shall do so at the appropriate time and justify the reasoning that for 
now will rely on intuition. 

a) We shall show that 


T 


sin z 
cos? æ < —— < 1 for 0 < |æ] < 5 


2 


Proof. Since cos* x and suz are even functions, it suffices to consider the case 
0< x< 7/2. By Fig. 3.1 and the definition of cos x and sin z, comparing the 
area of the sector 4OC'D, the triangle AOAB, and the sector <OAB, we 
have 

1 


1 se: 1 
SaocD = 310C] -|CD|= z (cos x) (x cos x) = z7 COS” x < 


1 1 1 
< SAOAB = z 0A -|BC| = a 1-sinz = 5 Sine < 


1 E | 1 
s JA NG E er 
< SaoaB = 5/04 | Bl\=5 T= ye 


B = (cosz,sin x) 


VAN 


A= (1,0) 


Fig. 3.1. 
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Dividing these inequalities by ir, we find that the result is what was 
asserted. O 


b) It follows from a) that 
|sinz| < |z| 
for any x € R, equality holding only at x = 0. 
Proof. For 0 < |x| < 2/2, as shown in a), we have 
|sin z| < |z]. 


But |sinz| < 1, so that this last inequality also holds for |x| > m/2 > 1. Only 
for x = 0 do we find sing =z=0. O 


c) It follows from b) that 


lim sing =0. 
x—0 
Proof. Since 0 < |sin z| < |x| and lim |x| = 0, we find by the theorem on the 
T 
limit of a function and inequalities (Theorem 3) that lim |sin x| = 0, so that 
: TtT—> 
lim sing = 0. O 
x—0 
sin x par = |. 


d) We shall now prove that lim S57 


Proof. Assuming that |x| < m/2, from the inequality in a) we have 


sin x 
l=smn re = = 1 
x 


But lim(1 — sin? z) = 1 — limsinz- lim sing = 1 — 0 = 1, so that by 
x—0 xz—0 xz—0 

the theorem on passage to the limit and inequalities, we conclude that 
lim #=£=1. QO 
z30 FT | 
Example 10. Definition of the exponential, logarithmic, and power functions 
using limits. We shall now illustrate how the high-school definition of the 
exponential and logarithmic functions can be completed by means of the 
theory of real numbers and limits. 

For convenience in reference and to give a complete picture, we shall start 
from the beginning. 

a) The exponential function. Let a > 1. 


1° For n € N we define inductively a! := a, a®t! := a” -a. 
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In this way we obtain a function a” defined on N, which, as can be seen 
from the definition, has the property 


ifm,neNandm>n. 


2° This property leads to the natural definitions 
1 
a? :=1, a" := — frn EN, 
a” 


which, when carried out, extend the function a” to the set Z of all integers, 
and then 


for any m,n E Z. 


3°. In the theory of real numbers we have observed that for a > 0 and 
n E€ N there exists a unique nth root of a, that is, a number zx > 0 such that 
x£” =a. For that number we use the notation a!/”. It is convenient, since it 
allows us to retain the law of addition for exponents: 


GS a! _ Cis =: gi/” Ae .a!/” = gil nt tln, 
For the same reason it is natural to set a™/” := (a/")™ and a~V/” := 


(a!/”)-} for n € N and m € Z. If it turns out that a(™*)/(*) — g™/” for 
k € Z, we can consider that we have defined a” for r € Q. 


4° For numbers 0 < z, 0 < y, we verify by induction that for n € N 
(x<y)@(a"<y"), 


so that, in particular, 
(z =y) S (a =y”). 


5° This makes it possible to prove the rules for operating with rational 
exponents, in particular, that 


q(Mk)/(nk) — gm/n for ke Z 


and 
gim/m ; gim2/n2 2 q™ı/ni+tm2/n2 


Proof. Indeed, a™*)/("*) > 0 and a™/” > 0. Further, since 
(alma) (ak) (Gree — 


_ p n1/(nk)\mknk _ (f a/(nk)yrk\* _ mk 
(Ge) (a=) a 
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and A 
(annn = (Ca ee == git 


? 


it follows that the first of the inequalities that needed to be verified in con- 
nection with point 4° is now established. 
Similarly, since 


(a/m ; qma/n2)mne _ (gry (a™m2/na) "n? _ 


= (Ga. i i Grn — giMin2 . gm2m = gminatmen 
and 
(amı /m+ma/na) "a = (ghee ere) nea) 22 = 
= (aiana rana) T Z giMinatmeni 


the second equality is also proved. O 


Thus we have defined a” for r € Q and a” > 0; and for any r1, r2 € Q, 


a. qt? =Q™"t2 . 


6° It follows from 4° that for r1,r2 € Q 
(ri <r) > (a™ <a"). 


Proof. Since (1 < a) & (1 < a”) for n € N, which follows immediately 
from 4°, we have (a!/")™ = a™/" > 1 for n,m €E N, as again follows from 4°. 
Thus for 1 < a and r > 0, r € Q, we have a” > 1. 

Then for rı < rg we obtain by 5° 


a7=Sa sq?" Sa? =l=]<a".'0 
7° We shall show that for ro € Q 


lim a’ =a". 
QƏr—>ro l 


Proof. We shall verify that a? + 1 as Q 5 p —> 0. This follows from the fact 
that for |p| < = we have by 6° 


aU" < aP cq”, 


We know that al/" — 1 (and a~1/” -+ 1) as n — oo. Then by standard 
reasoning we verify that for € > 0 there exists 6 > 0 such that for |p| < 6 we 


have 
l—e<aP<l-+e. 


We can take 4 as ô here if1—e <a )/" and a!” <1+e. 
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We now prove the main assertion. 
Given € > 0, we choose ô so that 


l— cea ™ <a? <1l+ea”™ 
for |p| < 6. If now |r — ro| < 6, we have 
a™(1—ea-") <a” =a"™-a™ <a(1+ea-") , 


which says 
a’ —e<a’<a%+e.0 


Thus we have defined a function a” on Q having the following properties: 


al=a>1; 


T2 _ q™tt2 . 
a 9 


a! 


-a 
a’) < a”? for rı < 12; 


a? => a°? as QƏrıi>rə. 
We now extend this function to the entire real line as follows. 
8° Let xz € R, s = sup a”, and i = inf a’. It is clear that s,i € R, 
QƏr>rzr 


Qar<x 
since for rı < £ < rə we have a™ < a”. 


We shall show that actually s = 7 (and then we shall denote this common 
value by a7). 


Proof. By definition of s and 2 we have 
at<s<i<a’’ 


for rı < x < r2. Then 0 < i— s < a™-—a™ =a" (a™—" — 1) < s(a?" — 1). 
But a? > 1 as Q 5 p —> 0, so that for any € > 0 there exists 6 > 0 such that 
a™2—"1 — 1 < e/s for 0 < rg — rı < 6. We then find that 0 < i — s < £, and 
since € > 0 is arbitrary, we conclude that i: = s. O 

We now define a” := s = 1. 


9° Let us show that a = lim a’. 
QƏr>r 


Proof. Taking 8° into account, for € > 0 we find r’ < x such that s — e€ < 
a” < s =a” and r” such that a =i <a" < i+e. Sincer’ <r < r” implies 
a” <a" <a” , we then have, for all r € Q in the open interval |r’, r”[, 


a —e<a’<a’*+e.0 


We now study the properties of the function a” so defined on R. 


10° For 21,22 € R and a> 1, (x1 < x2) > (a7! < a), 
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Proof. On the open interval |x1, x2[ there exist two rational numbers rı < r2. 
If x1 < rı < r2 < Xo, by the definition of a” given in 8° and the properties 
of the function a” on Q, we have 


Gg aaa so 
11° For any 21,22 € R, a™ - a”? = a™1+2, 


Proof. By the estimates that we know for the absolute error in the product 
and by property 9°, we can assert that for any € > 0 there exists 6’ > 0 such 


that 


E E 
drina =op EO EG a se 


for |x; — rı| < 6’, and |z2 — r2| < 6’. Making 6’ smaller if necessary, we can 
choose ô < 6’ such that we also have 


q™itte _ E < gtitze < q™itre ae = 


for |x — rı| < 6 and |z2 — r2| < 6, that is, |(£x1 + £2) — (rı + r2)| < 26. 
But a”! - a"? = a™t"2, for r1, T2 € Q, so that these inequalities imply 
a7! a7? — e < g™t*2 < g™ a? +e. 


Since € > 0 is arbitrary, we conclude that 


a?! . qt = qazı t72 o 


12° lim a? = a°. (We recall that “r — 2p” is an abbreviation of 


“R> z > zo”). 


Proof. We first verify that lim a? = 1. Given e€ > 0, we find n € N such that 
r—> 


te Meane ite: 
Then by 10°, for |z| < 1/n we have 
toceg Vee du te 1S, 


that is, we have verified that lim a? = 1. 
xz—0O 


If we now take ô > 0 so that |a7~*° — 1| < ea~*° for |x — zo| < 6, we find 
ise =o Sa (aS 1) are 


which verifies that lim a? =a. O 
T Zo 


13° We shall show that the range of values of the function x +> a® is the 
set R, of positive real numbers. 
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Proof. Let yo E€ R+. If a > 1, then as we know, there exists n € N such that 
a” < yo <a”. 
By virtue of this fact, the two sets 


A= {x € R| a” < yo} and B= {x E€ R| yo < a} 


are both nonempty. But since (xı < z2) & (a™! < a®2) (when a > 1), for 
any numbers z1,%2 € R such that xı € A and z2 € B we have zı < 29. 
Consequently, the axiom of completeness is applicable to the sets A and B, 
and it follows that there exists zo such that xı < zo < xə for all xı € A and 
x2 € B. We shall show that a™° = yo. 

If a®° were less than yo, then, since a7°+!/" — g™° as n — oo, there 
would be a number n € N such that a%°+!/" < yo. Then we would have 
(xo + +) € A, while the point zo separates A and B. Hence the assumption 
a7? < yo is untenable. Similarly we can verify that the inequality a° > yo 
is also impossible. By the properties of real numbers, we conclude from this 
that a° = yo. O 


14° We have assumed up to now that a > 1. But all the constructions 
could be repeated for 0 < a < 1. Under this condition 0 < a” < 1 if r > 0, 
so that in 6° and 10° we now find that (11 < x2) => (a™! > a®?) where 
O<a<l. 

Thus for a > 0, a 1, we have constructed a real-valued function xz + a” 
on the set R of real numbers with the following properties: 

1) a! =a; 

2) atı . q®2 = grite: 

3) a” > a™ as & > Tọ; 

4) (a™ < a®?) = (zı < x2) if a > 1, and (a™! > a?) & (a < a2) if 
0<a<l; 

5) the range of values of the mapping x +> a” is R} = {y € R|0 < y}, 
the set of positive numbers. 


Definition 7. The mapping x +> a” is called the exponential function with 
- base a. 


The mapping x +> e”, which is the case a = e, is encountered particu- 
larly often and is frequently denoted exp zx. In this connection, to denote the 
mapping x +> a”, we sometimes also use the notation exp, 2. 

b) The logarithmic function The properties of the exponential function 
show that it is a bijective mapping exp, : R —> R+. Hence it has an inverse. 


Definition 8. The mapping inverse to exp, : R —> R+ is called the logarithm 
to base a (0 < a, a# 1), and is denoted 


log, : Ri, >R. 


Definition 9. For base a = e, the logarithm is called the natural logarithm 
and is denoted In: R4 > R. 
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The reason for the terminology becomes clear under a different approach 
to logarithms, one that is in many ways more natural and transparent, which 
we shall explain after constructing the fundamentals of differential and inte- 
gral calculus. 

By definition of the logarithm as the function inverse to the exponential 
function, we have 

Vz € R (log, (a”) = 2), 
Vy E Ry (aBa Y = y) . 

It follows from this definition and the properties of the exponential func- 
tion in particular that in its domain of definition R, the logarithm has the 
following properties: 

1’) log, a = 1; 

2") loga (yi : Y2) = log, Y1 + log, Y2; 

3’) log, Y + log, yo as Ry Ð y > yo E R4; 

4’) (log, yı < log, y2) = (yı < ye) if a > 1 and (log, yi > log, y2) > 
(yı < yo) if0<a<1; 

5’) the range of values of the function log, : R+ — R is the set R of all 
real numbers. 


Proof. We obtain 1’) from property 1) of the exponential function and the 
definition of the logarithm. 

We obtain property 2’) from property 2) of the exponential function. 
Indeed, let xı = log, yi and x2 = log, yo. Then yı = a”! and y2 = a®?, and 
so by 2), y1 -Y2 = a”! -a*? = a™!t72, from which it follows that log, (y1 Y2) = 
£1 + Xo. 

Similarly, property 4) of the exponential function implies property 4’) of 
the logarithm. 

It is obvious that 5) = 5’). 

Property 3’) remains to be proved. 

By property 2’) of the logarithm we have 


Y 
log, y — log, yo = log, (=) 
Yo 
and therefore the inequalities 


—e < log, y — log, yo < € 


are equivalent to the relation 
log, (a °) = —e < log, Ge, < € = log, (af), 
0 


which by property 4’) of the logarithm is equivalent to 
Y 
Yo 
2 <a for 0<a<l. 


Va 


—ae < <a? for a>l, 


aÊ < 
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In any case we find that if 
Yoa E < y < yoa’ en a>l 
or 
yoa? < y < yoa € when0O<a<1l, 


we have 
—e < log, y — loga Yo < € . 


Thus we have proved that 


a ee: log, y = log, yo. O 
Figure 3.2 shows the graphs of the functions e”, 107, ln x, and log,) £ =: 
log x; Fig. 3.3 gives the graphs of (4)", 0.17, log,/.x, and logy ; z. 
We now give a more detailed discussion of one property of the logarithm 
that we shall have frequent occasion to use. 
We shall show that the equality 


6") loga (b%) 


holds for any b > 0 and any Q €E R. 


a loga b 


Proof. 1° The equality is true for a = n € N. For by property 2’) of the 
logarithm and induction we find log, (yi1---Yn) = log, yi + +- + loga Yn, SO 
that 

log, (b”) = log, b + ---+log, b = n log, b . 


Fig. 3.2. 
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0,17 
(1/e)7 


Fig. 3.3. 


2° log (b71) = — log, b, for if 8 = log, b, then 
b=af, b-1=a-* and log,(b7™!) = 6. 


3° From 1° and 2° we now conclude that the equality log, (b%) = a log, b 
holds for a € Z. 
4° log,(b'/") = + log, b for n € Z. Indeed, 


log, b = log, (b'/”)" = n log, (b'/”) . 
50 We can now verify that the assertion holds for any rational number 
a = 2 € Q. In fact, 
— log, b = mlog, (b'/") = log, (b"/")™ = loga (0™/”) . 


6°. But if the equality log, b” = r log, b holds for all r € Q, then letting r 
in Q tend to a, we find by property 3) for the exponential function and 3’) 
for the logarithm that if r is sufficiently close to a, then b” is close to b® and 
loga b” is close to log, 6%. This means that 


lim log, b” = log, b” . 


Q3r--a 


But log, b” = r log, b, and therefore 


log, b% = se i log, O° = o T log, b = alog, b . O 
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From the property of the logarithm just proved, one can conclude that 
the following equality holds for any a, @ € R and a > 0: 

6) (a%)F = a®. 
Proof. For a = 1 we have 1% = 1 by definition for all a € R. Thus the equality 


is trivial in this case. 
If a £ 1, then by what has just been proved we have 


log, ((a%)°) = Blog, (a) = B-alog,a=B-a= log, (a°) 
which by property 4’) of the logarithm is equivalent to this equality. O 


c) The power function. If we take 1% = 1, then for all x > 0 and a € R 
we have defined the quantity x“ (read “x to power a”). 


Definition 10. The function x +» x“ defined on the set R, of positive 
numbers is called a power function, and the number a is called its exponent. 


A power function is obviously the composition of an exponential function 
and the logarithm; more precisely 


a 
re = qiBa lx ) q2 log, =~ 


Figure 3.4 shows the graphs of the function y = x“ for different values of 
the exponent. 


y ai? 
x! x? 
E 
m 
1 yy) 
0 1 A 
Fig. 3.4. 


3.2.3 The General Definition of the Limit of a Function 
(Limit over a Base) 


When proving the properties of the limit of a function, we verified that the 
only requirements imposed on the deleted neighborhoods in which our func- 
tions were defined and which arose in the course of the proofs were the prop- 
erties B1) and B2), mentioned in the introduction to the previous subsection. 
This fact justifies the definition of the following mathematical object. 
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a. Bases; Definition and Elementary Properties 


Definition 11. A set B of subsets B C X of a set X is called a base in X if 
the following conditions hold: 

Bi) VBE B (B # Ø); 

B2) VB, € B YB2 € B JB € B (B C Bı AN B2). 


In other words, the elements of the collection B are nonempty subsets of 
X and the intersection of any two of them always contains an element of the 
same collection. 

We now list some of the more useful bases in analysis. 


Notation for Read Sets (elements) Definition of and 
the base of the base notation for elements 
ra x tends Deleted neigh- U (a) := 
toa borhoods of a € R = {x E€ R| a — ôı < 


<xr<a+ô2:^T Fah}, 
where ôı > 0, d2 > 0 


r — OO x tends Neighborhoods U (o0) := 
to infinity of infinity = {x E R| ô < |z|}, 
where ô € R 
r>a, ztEE x tends to a Deleted neigh-* Ur (a):= EN U (a) 
or in E borhoods of a in E 
B>2-a 
or 
x—a 
cE 
r3o0o, ree x tends to Neighborhoods** Ur(co) := EN U (o0) 
or infinity in E of infinity in E 
E 5 xz —> œ 
or 
T — 00 
€E 


* It is assumed that a is a limit point of E. 
** It is assumed that E is not bounded. 


If E = Ef = {x € R| x > a} (resp. E = E} = {x € R| x < a}) we write 
x — a +0 (resp. x + a — 0) instead of x —> a, x € E, and we say that x tends 
to a from the right (resp. x tends to a from the left) or through larger values 
(resp. through smaller values). When a = 0 it is customary to write x —> +0 
(resp. x — —O) instead of x + 0 + 0 (resp. x — 0 — 0). 

The notation E > x —> a + 0 (resp. E > x + a — 0) will be used instead 
of —> a, x E€ ENE? (resp. x > a, x € EN EZ). It means that x tends to 
a in E while remaining larger (resp. smaller) than a. 
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If 
E = E} = {xe R|c< zx} (resp. E = EX = {x € R| z < c}), 


we write x — +00 (resp. x —> —oo) instead of x —> oo, x € E and say that x 
tends to positive infinity (resp. x tends to negative infinity). 

The notation E > x > +00 (resp. E > x  —co) will be used instead of 
x > œ, x E€ ENED (resp. x > œ, x € ENE). 

When E = N, we shall write (when no confusion can arise), as is custom- 
ary in the theory of limits of sequences, n —> oo instead of z + œ, x EN. 

We remark that all the bases just listed have the property that the inter- 
section of two elements of the base is itself an element of the base, not merely 
a set containing an element of the base. We shall meet with other bases in 
the study of functions defined on sets different from the real line.1! 

We note also that the term “base” used here is an abbreviation for what 
is called a “filter base”, and the limit over a base that we introduce below is, 
as far as analysis is concerned, the most important part of the concept of a 
limit over a filter1!?, created by the modern French mathematician H. Cartan. 


b. The limit of a Function over a Base 


Definition 12. Let f : X — R be a function defined on a set X and Ba 
base in X. A number A € R is called the limit of the function f over the 
base B if for every neighborhood V(A) of A there is an element B € B whose 
image f(B) is contained in V(A). 


If A is the limit of f : X — R over the base B, we write 


lim f(a) =A. 
We now repeat the definition of the limit over a base in logical symbols: 


(lim f(x) = A) := VV(A) 3B € B (f(B) c V(A)). 


Since we are considering numerical-valued functions at the moment, it is 
useful to keep in mind the following form of this fundamental definition: 


(lim f(x) = A) =Ve>05BeE B Yx € B (|f(z) — A| <e). 


In this form we take an e-neighborhood (symmetric with respect to A) 
instead of an arbitrary neighborhood V(A). The equivalence of these defini- 
tions for real-valued functions follows from the fact mentioned earlier that 


11 For example, the set of open disks (not containing their boundary circles) con- 
taining a given point of the plane is a base. The intersection of two elements of 
the base is not always a disk, but always contains a disk from the collection. 

12 For more details, see Bourbaki’s General topology Addison-Wesley, 1966. 
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every neighborhood of a point contains a symmetric neighborhood of the 
same point (carry out the proof in full!). 

We have now given the general definition of the limit of a function over a 
base. Earlier we considered examples of the bases most often used in analysis. 
In a specific problem in which one or another of these bases arises, one must 
know how to decode the general definition and write it in the form specific 
to that base. 

Thus, 

( lim f(x) =A) := Ve >0 36 > 0 Yz Ela — ô, a| (|f(z) — Al <€), 


xz—a—0 
( lim f(z) =A) := Ye > 0 3ô € R Vz < ô (|f(z) — Al < €). 
z—> — Co 
In our study of examples of bases we have in particular introduced the 
concept of a neighborhood of infinity. If we use that concept, then it makes 
sense to adopt the following conventions in accordance with the general def- 
inition of limit: 


( lim f(x) = œ) := YV (œ) JB € B (f(B) c V(ov)) , 
or, what is the same, 
(lim f(a) = 00) = Ve > 0 IB € B Vx € B (e < |f(x)|) , 
(lim f(x) = +00) = Ve ER IB € B Yz €B (e < f(z)), 
(lim f(a) = —oo) = Ye ER IB E€ B Yx E B (f(x) <e). 


The letter £ is usually assumed to represent a small number. Such is not 
the case in the definitions just given, of course. In accordance with the usual 
conventions, for example, we could write 

( jim f(z) =- 00) := Ve € R Jô € R Vr > ô (f(x) < e) . 

We advise the reader to write out independently the full definition of limit 
for different bases in the cases of both finite (numerical) and infinite limits. 

In order to regard the theorems on limits that we proved for the special 
base E > x — a in Subsect. 3.2.2 as having been proved in the general case of 
a limit over an arbitrary base, we need to make suitable definitions of what 
it means for a function to be ultimately constant, ultimately bounded, and 
infinitesimal over a given base. 


Definition 13. A function f : X — R is ultimately constant over the base 
B if there exists a number A € R and an element B € B such that f(x) = A 
for all x € B. 


Definition 14. A function f : X —> R is ultimately bounded over the base B 
if there exists a number c > 0 and an element B € B such that |f (x)| < c for 
all x € B. 
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Definition 15. A function f : X — R is infinitesimal over the base B if 
| lim f(x) =0. 


After these definitions and the fundamental remark that all proofs of the 
theorems on limits used only the properties B1) and B2), we may regard all 
the properties of limits established in Subsect. 3.2.2 as valid for limits over 
any base. 

In particular, we can now speak of the limit of a function as x — oo or as 
x —> —oo or as © — +00. 

In addition, we have now assured ourselves that we can also apply the 
theory of limits in the case when the functions are defined on sets that are 
not necessarily sets of numbers; this will turn out to be especially valuable 
later on. For example, the length of a curve is a numerical-valued function 
defined on a class of curves. If we know this function on broken lines, we can 
define it for more complicated curves, for example, for a circle, by passing to 
the limit. 

At present the main use we have for this observation and the concept 
of a base introduced in connection with it is that they free us from the 
verifications and formal proofs of theorems on limits for each specific type 
of limiting passage, or, in our current terminology, for each specific type of 
base. 

In order to master completely the concept of a limit over an arbitrary 
base, we shall carry out the proofs of the following properties of the limit of 
a function in general form. 


3.2.4 Existence of the Limit of a Function 


a. The Cauchy Criterion Before stating the Cauchy criterion, we give the 
following useful definition. 


Definition 16. The oscillation of a function f : X —> R on a set E C X is 


WF, B):= sup m; 


T1 »X2E 


that is, the least upper bound of the absolute value of the difference of the 
values of the function at two arbitrary points 71,22 E€ E. 


Example 11. w(x?,{—1,2]) = 4; 

Example 12. w(x, |—1,2]) = 3; 

Example 18. w(x,] — 1, 2[{) = 3; 

Example 14. w(sgn x, [—1, 2]) = 2; 

Example 15. w(sgn z, [0, 2]) = 1; 
w( 


Example 16. w(sgn z, |0,2]) = 0. 
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Theorem 4. (The Cauchy criterion for the existence of a limit of a function). 
Let X be a set and B a base in X. 

A function f : X —> R has a limit over the base B if and only if for every 
€ > 0 there exists B € B such that the oscillation of f on B is less than €. 


Thus, 
Jlim f(x) Ve > 0 IB € B (w(f,B) <e). 


Proof. Necessity. If lim f(x) = AER, then, for all € > 0, there exists an 


element B € B such that |f(x) — A| < ¢/3 for all x € B. But then, for any 
zı, £2 E B we have 


|f(21) — f(22)| < If f(a) — Al + If) — Al < = 


and therefore w( f; B) < €. 


Sufficiency. We now prove the main part of the criterion, which asserts 
that if for every € > 0 there exists B € B for which w(f, B) < e, then the 
function has a limit over B. 

Taking € successively equal to 1,1/2,...,1/n,..., we construct a sequence 
Bı, B2,..., Bn... of elements of B such that w(f,Bn) < 1/n, n e N. 
Since Bn # Ø, we can choose a point £n in each Bn. The sequence 
f (x1), f(@2),..., f(an),... is a Cauchy sequence. Indeed, B,N Bm 4 Ø, and, 
taking an auxiliary point x € Bn N Bm, we find that |f(z,) — f(ram)| < 
\f(an) — f(x)| + |f(x) — f(am)| < 1/n+1/m. By the Cauchy criterion 
for convergence of a sequence, the sequence {f(z,),n € N} has a limit 
A. It follows from the inequality established above, if we let m — ov, 
that |f(zn) — A| < 1/n. We now conclude, taking account of the in- 
equality w(f; Bn) < 1/n, that |f(x) — A| < £ at every point x € Bn if 
n>N=(2/e]+1. o 


Remark. This proof, as we shall see below, remains valid for functions with 
values in any so-called complete space Y. If Y = R, which is the case we are 
most interested in just now, we can if we wish use the same idea as in the 
proof of the sufficiency of the Cauchy criterion for sequences. 


Proof. Setting mg = inf f(x) and Mg = sup f(x), and remarking éhat 
2€B 


MB, < MBinB. ŠÍ Maan, < < Mp, for any elements Bı and Bə of the 
base B, we find by the axiom of completeness that there exists a number 
A € R separating the numerical sets {mg} and {Mg}, where B € B. Since 
w(f; B) = Mg — mp, we can now conclude that, since w( f; B) < £, we have 


|f(£)— A| < £ at every point z € B. O 


Example 17. We shall show that when X = N and B is the base n > oo, 
n € N, the general Cauchy criterion just proved for the existence of the limit 
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of a function coincides with the Cauchy criterion already studied for the 
existence of a limit of a sequence. 

Indeed, an element of the base n > 00, n EN, is a set B=NNU(co) = 
{n e NIN < n} consisting of the natural numbers n € N larger than some 
number N €e R. Without loss of generality we may assume N e€ N. The 
relation w( f; B) < € now means that |f(n1) — f(n2)| < £ for all nı, n2 > N. 

Thus, for a function f : N — R, the condition that for any € > 0 there 
exists B € B such that w(f; B) < € is equivalent to the condition that the 
sequence {f(n)} be a Cauchy sequence. 


b. The Limit of a Composite Function 


Theorem 5. (The limit of a composite function). Let Y be a set, By a base 
inY, andg: Y — R a mapping having a limit over the base By. Let X be 
a set, Bx a base in X and f : X — Y a mapping of X into Y such that 
for every element By € By there exists Bx € Bx whose image f(Bx) is 
contained in By. 

Under these hypotheses, the composition go f : X — R of the mappings f 
and g is defined and has a limit over the base Bx and lim(go f\(2)= lim g(y). 

Y 


Proof. The composite function go f : X — R is defined, since f(X) c 
Y. Suppose lim g(y) = A. We shall show that lim(g o f)(x) = A. Given 
Y x 


a neighborhood V(A) of A, we find By € By such that g(By) C V(A). 
By hypothesis, there exists Bx € Bx such that f(Bx) C By. But then 
(go f)(Bx) = 9(f(Bx)) C g(By) C V(A). We have thus verified that A is 
the limit of the function (go f): X — R over the base Bx. O 


Example 18. Let us find the following limit: 


_ sin7x 
lim = 
z—0 72x 


| If we set g(y) = snay and f(x) = Tz, then (go f)(x) = $=. In this 
case Y = R \ 0 and X= R. Since lim g(y) = lim 22% = 1, we can apply 
y—0 yoo Y 


the theorem if we verify that for any element of the base y — 0 there is an 
element of the base x — 0 whose image under the mapping f(x) = 7z is 
contained in the given element of the base y —> 0. 


The elements of the base y — 0 are the deleted neighborhoods Uy (0) of 
the point 0 € R. 


The elements of the base x — 0 are also deleted neighborhoods U x (0) of 


the point 0 € R. Let Uy (0) = {y € R|a < y < b, y #0} (wherea,BeER 
and œ < 0, 8 > 0) be an arbitrary deleted neighborhood of 0 in Y. If we take 


Ux (0) = {x ERTS £, w 0}, this deleted neighborhood of 0 in X 
has the property that f( Ux (0)) = Uy (0) C Uy (0). 
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The hypotheses of the theorem are therefore satisfied, and we can now 
assert that 


_ sin 7x _ sin 
lim = lim ioe eee 
x—0 £ y>0 y 


Ls 


Example 19. The function g(y) = |sgn y|, as we have seen (see Example 3), 
has the limit lim Isgny| = 1. 
y= 


The function y = f(z) = xsin 4, which is defined for x # 0, also has the 
limit lim xsin + = 0 (see Example 1). 
g= 


However, the function (g o f)(x) = lsen(x sin 2)| has no limit as z —> 0. 

Indeed, in any deleted neighborhood of x = O there are zeros of the 
function sin +, so that the function |sgn(zsin+)| assumes both the value 
1 and the value 0 in any such neighborhood. By the Cauchy criterion, this 
function cannot have a limit as z — 0. 

But does this example not contradict the Theorem 5? 

Check, as we did in the preceding example, to see whether the hypotheses 
of the theorem are satisfied. 


Example 20. Let us show that 


1 x 
itn (1+-) ae 
xL—OCo L 


Proof. Let us make the following assumptions: 


Y =N, By isthe basen => œ,neN; 
X =R, = {z € R|z > 0}, Bx is the base x > +00 ; 


f:X >Y is the mapping z =e [x] , 


where [x] is the integer part of x (that is, the largest integer not larger than zx). 

Then for any By = {n € N|n > N} in the base n + oo, n € N there 
obviously exists an element Bx = {x € R|x > N + 1} of the base x — +00 
whose image under the mapping x + [z] is contained in By. 

The functions g(n) = (1+ +4)", gi (n) = (1+ a)" 
(1 + ae as we know, have the number e as their limit in the base n —> oo, 
neN. 

By Theorem 4 on the limit of a composite function, we can now assert 
that the functions 


, and go(n) = 


wN ( h) ad= h(a) 


[z]+1 
(92° f) = a) i 


also have e as their limit over the base x => +oo. 
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It now remains for us only to remark that 


[x] = a]+1 
(ea) < (1+-) <a 


for x > 1. Since the extreme terms here tend to e as x > +o, it follows from 
Theorem 3 on the properties of a limit that lim (1 + i\* =e. D 
xr— +00 e 


Using Theorem 5 on the limit of a composite function, we now show that 
lim (1 + 1)” = ẹ. 
z— — oo 


Proof. We write 
, INg : 1 (—t) , 1\-t 


j; Et j; EO ARSE a 1 
= lim (1+) = lim (1+) Jim, (1+ >) = 


1 t—1 1\u 
= lim (1+—) = lim (1+-) =e. 
t+00 t=) u—+oo u 
When we take account of the substitutions u = t — 1 and t = —z, these 


equalities can be justified in reverse order (!) using Theorem 5. Indeed, only 
after we have arrived at the limit lim (1 + 1)", whose existence has already 
u (0.9) 


been proved, does the theorem allow us to assert that the preceding limit also 
exists and has the same value. Then the limit before that one also exists, and 
by a finite number of such transitions we finally arrive at the original limit. 
This is a very typical example of the procedure for using the theorem on the 
limit of a composite function in computing limits. 
Thus, we have 

1\2 1\2 
lim (1+=) =e= lim (1+-) . 

x T£ 


x—— oo x—+oo 


It follows that lim (1 + 1)” = ẹ. Indeed, let £ > 0 be given. 
AT N y a 
Since lim (1+4) =e, there exists cı € R such that |(1+4) —eļ < € 
L—>—0O 


for © < cı. 
i : 1) _. ; EE 
Since „im (1+4) =e, there exists cp € R such that |(1+4) —eļ| < e 
for co < T. 
Then for |x| > c = max{|cı|, |c2|} we have |(1 + 1)” — e] < €, which 
verifies that lim (1 + i)" =e. O 


L—-Oo 


Example 21. We shall show that 


lim(1 +t)! =e. 
t—0 
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Proof. After the substitution x = 1/t, we return to the limit considered in 
the preceding example. 


Example 22. 


Proof. We know (see Example 11 in Sect. 3.1) that lim a =Oifqg>1. 
n—oo 


Now, as in Example 3 of Sect. 3.1, we can consider the auxiliary mapping 
f : Ry —> N given by the function [x] (the integer part of x). Using the 
inequalities 
1 fje] x | [2]4+1 


g gel ~ ge ~ galt A 


and taking account of the theorem on the limit of a composite function, we 
find that the extreme terms here tend to 0 as x — +00. We conclude that 
lim ==0. QO 


z—+oo 1” 


Example 23. 


lim Deua a 0. 


xL—>+00 4 bs 


Proof. Let a > 1. Set t = log, x, so that x = at. From the properties of the 
exponential function and the logarithm (taking account of the unboundedness 
of a” for n € N) we have (x > +00) & (t + +00). Using the theorem on the 
limit of a composite function and the result of Example 11 of Sect. 3.1, we 
obtain 
log, £ t 
lim R —— = 0 7 

L—+00 x t3+00 qt 
If0 <a <1 we set —t = log, z, £ = a™*. Then (x > +00) & (t + +00), 
and since 1/a > 1, we again have 


log, x _ i —t 


zo+oo T acre a-t — ee, (1/a)t RU 


c. The Limit of a Monotonic Function We now consider a special class of 
numerical-valued functions, but one that is very useful, namely the monotonic 
functions. 


Definition 17. A function f : E — R defined on a set Æ C R is said to be 


increasing on E if 
Yzı, £2 € E (z1 < £2 > f(x) < f(x2)) ; 
nondecreasing on E if 


Yri, t2 E E (xı < T: =>> f(z) < f(x2)) ; 
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nonincreasing on E if 
Va1,22 E€ E (x1 < £2 > f(x1) > f(x2)) ; 
decreasing on E if 
Vai, £2 E€ E (£1 < £2 > f(x1) > f(x2)) . 


Functions of the types just listed are said to be monotonic on the set E. 

Assume that the numbers (or symbols —oo or +00) 7 = inf E and 
s = sup E are limit points of the set E, and let f : E — R be a mono- 
tonic function on E. Then the following theorem holds. 


Theorem 6. (Criterion for the existence of a limit of a monotonic function). 
A necessary and sufficient condition for a function f : E > R that is nonde- 
creasing on the set E to have a limit asx > s, x € E, is that it be bounded 
above. For this function to have a limit as x > i, x € E, it is necessary and 
sufficient that it be bounded below. 


Proof. We shall prove this theorem for the limit lim f(x). 
E>z2z—-8s 


If this limit exists, then, like any function having a limit, the function f 
is ultimately bounded over the base E 3 x > s. 

Since f is nondecreasing on EF, it follows that f is bounded above. In fact, 
we can even assert that f(x) < pim, , f(x). That will be clear from what 


follows. 
Let us pass to the proof of the existence of the limit pim f(x) when f 
rs 


is bounded above. 
Given that f is bounded above, we see that there is a least upper bound 
of the values that the function assumes on E. Let A = sup f(x). We shall 
reEk 


show that pim f(x) = A. Given € > 0, we use the definition of the least 
L—-Ss 


` upper bound to find a point 29 € E for which A—e < f(xoọ) < A. Then, since 
f is nondecreasing on E, we have A—e < f(x) < A for zo < x < E. But the 
set {x € E| xq < x} is obviously an element of the base x > s, x € E (since 
s = sup E). Thus we have proved that pim, : f(x) =A. 


For the limit „im f(x) the reasoning is analogous. In this case we have 
Dx—>i 


zim AD = Efe). c 


d. Comparison of the Asymptotic Behavior of Functions We begin 
this discussion with some examples to clarify the subject. 

Let m(x) be the number of primes not larger than a given number z € R. 
Although for any fixed x we can find (if only by explicit enumeration) the 
value of m(x), we are nevertheless not in a position to say, for example, how 
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the function m(x) behaves as x — +00, or, what is the same, what the 
asymptotic law of distribution of prime numbers is. We have known since 
the time of Euclid that a) — +oo as x — +00, but the proof that m(x) 
grows approximately like >>, was achieved only in the nineteenth century by 
P. L. Chebyshev.?? 

When it becomes necessary to describe the behavior of a function near 
some point (or near infinity) at which, as a rule, the function itself is not de- 
fined, we say that we are interested in the asymptotics or asymptotic behavior 
of the function in a neighborhood of the point. 

The asymptotic behavior of a function is usually characterized using a 
second function that is simpler or better studied and which reproduces the 
values of the function being studied in a neighborhood of the point in question 
with small relative error. 

Thus, as x —> +00, the function m(x) behaves like >=; as x — 0, the 
function sin g behaves use the constant function 1. When we speak of the 
behavior of the function x? are + sin + as £x — oo, we shall obviously say that 
it behaves basically like x”, while in speaking of its behavior as x — 0, we 
shall say it behaves like sin =. 

We now give precise definitions of some elementary concepts involving 
the asymptotic behavior of functions. We shall make systematic use of these 
concepts at the very first stage of our study of analysis. 


Definition 18. We shall say that a certain property of functions or a certain 
relation between functions holds ultimately over a given base B if there exists 
B € B on which it holds. 


We have already interpreted the notion of a function that is ultimately 
constant or ultimately bounded in a given base in this sense. In the same 
sense we shall say from now on that the relation f(x) = g(x)h(x) holds ulti- 
mately between functions f, g, and h. These functions may have at the outset 
different domains of definition, but if we are interested in their asymptotic 
behavior over the base B, all that matters to us is that they are all defined 
on some element of B. 


Definition 19. The function f is said to be infinitesimal compared with 
the function g over the base B, and we write f =o(g) or f = o(g) over B if 


the relation f(x) = a(x)g(x) holds ultimately over the B, where a(x) is a 
function that is infinitesimal over B. 


2 


Example 24. x? = o(x) as x —> 0, since r? = q£- x. 


Example 25. x = o(x*) as x — œ, since ultimately (as long as x # 0), 


_ 1.2 
T=} T~. 


13 P, L. Chebyshev (1821-1894) — outstanding Russian mathematician and special- 
ist in theoretical mechanics, the founder of a large mathematical school in Russia. 
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From these examples one must conclude that it is absolutely necessary to 
indicate the base over which f = o(g). 

The notation f = o(g) is read “f is little-oh of g”. 

It follows from the definition, in particular, that the notation f 5 C1); 


which results when g(x) = 1, means simply that f is infinitesimal over B. 


Definition 20. If f = o(g) and g is itself infinitesimal over B, we say that f 
is an infinitesimal of higher order than g over B. 


Example 26. x7? = 4 is an infinitesimal of higher order than z7! = 1 as 


T7 00. 


Definition 21. A function that tends to infinity over a given base is said to 
be an infinite function or simply an infinity over the given base. 
Definition 22. If f and g are infinite functions over B and f = o(g), we say 
that g is a higher order infinity than f over B. 

Example 27. + > oo as x + 0, 4 00 as x — 0 and 4 = o(4). Therefore 


A is a higher order infinity than 1 as x — 0. 
At the same time, as x > 00, x? is a higher order infinity than z. 


It should not be thought that we can characterize the order of every 
infinity or infinitesimal by choosing some power x” and saying that it is of 
order n. 


Example 28. We shall show that for a > 1 and any neZ 


lim — =0, 


r—+oo a? 
that is, x” = o(a?) as x > +00. 


Proof. If n < 0 the assertion is obvious. If n € N, then, setting q = Ya, we 
have q > 1 and 2 = (4)", and therefore 


q7” 
; g” ; rN” f Ý ; L 
it: =e “Tia (=) =: Viti eens. di 0, 
r—++0o a? r—+oo \q* xr—+00 q” z—+oo q” 
n factors 


We have used (with induction) the theorem on the limit of a product and 
the result of Example 22. O 


Thus, for any n € Z we obtain x” = o(a”) asx > +œ ifa > 1. 


Example 29. Extending the preceding example, let us show that 


re 


ım 
z—+00 Qt 


for a > 1 and any a €E R, that is, r% = o(a”) as x + +00. 
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Proof. Indeed, let us choose n € N such that n > a. Then for x > 1 we 
obtain 

co. a 

0 < — < —. 

az ax 
Using properties of the limit and the result of the preceding example, we find 
that lim = =0. QO 

r—+oo ? 


Example 30. Let us show that 


—1/x 
i a 
lim =( 
R+ >z2—-0 7 


for a > 1 and any a €E R, that is, a~1/* = o(z*) as x > 0, x € R4. 


Proof. Setting x = —1/t in this case and using the theorem on the limit of a 
composite function and the result of the preceding example, we find 
a`! [zx te 


lim = lm —=0.0 
R+5x>0 T? t3+00 qt 


Example 31. Let us show that 


log, x 
lim Sa? =0 
r—=>+o qre 
for a > 0, that is, for any positive exponent a we have log, x = o(x%) as 
D> TOO: 


Proof. If a > 1, we set x = a‘/“. Then by the properties of power functions 
and the logarithm, the theorem on the limit of a composite function, and the 
result of Example 29, we find 
l t 1 t 
p ee as a a Gite Bay. 


zr—>+oo gvel z to+oo gt Q to3+00 gt 


If 0 < a < 1, then 1/a > 1, and after the substitution x = a~*/® we 
obtain 


l n = 
lim gies ec lim (=t/a) = l; S 
r=>+oo 7% t=>+œ0 qt œ t++o00 (1/a)t 
Example 32. Let us show further that 
xr“ log, x = o(1) as x —> 0, x € Ry 


for any a > 0. 
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Proof. We need to show that lim z* log, x = 0 for a > 0. Setting x = 1/t 


R,3z2-0 
and applying the theorem on the limit of a composite function and the result 
of the preceding example, we find 


loga(l/t) __ um Bat 


ia t++oo {2 


=0.0 


lim z*log,z= lim 
R i320 t—+ 00 


Definition 23. Let us agree that the notation f 5 (9) or f = O(g) over 


the base B (read “f is big-oh of g over B”) means that the relation f(x) = 
G(x)g(x) holds ultimately over B where G(x) is ultimately bounded over B. 


In particular f = O(1) means that the function f is ultimately bounded 


over B. 

Example 33. (+ +sinz)x = O(2) as z > œo. 

Definition 24. The functions f and g are of the same order over B, and we 
write f x g over B, if f A O(g) and f = O(f) simultaneously. 

Example 84. The functions (2+sin x)z and z are of the same order as x — oo, 


but (1+ sin x)z and z are not of the same order as x — ov. 


The condition that f and g be of the same order over the base B is 
obviously equivalent to the condition that there exist cı > 0 and c2 > 0 and 
an element B € B such that the relations 


cilg(x)| < |f (2)| < calg(2)| 


hold on B, or, what is the same, 
1 1 
=|f(2) < |g(2)| < =If(@)1. 
2 C1 


Definition 25. If the relation f(x) = y(x)g(x) holds ultimately over B 
- where lim y(x) = 1, we say that the function f behaves asymptotically like g 


over B, or, more briefly, that f 2s equivalent to g over B. 


In this case we shall write f Sg Or f ~g over B. 


The use of the word equivalent is justified by the relations 
(aT) 
(Fo) => (oa) ; 
(F ~g) A (gxh) = (F xh) . 
Indeed, the relation f = f is obvious, since in this case y(x) = 1. Next, if 


lim y(x) = 1, then lim Tey = 1 and g(x) = F] f(x). Here all we that need 
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to explain is why it is permissible to assume that y(x) # 0. If the relation 
f(x) = 7(z)g(x) holds on B, € B, and 4 < |7(z)| < 3 on Bo € B, then we 
can take B € B with B c Bı N B2, on which both relations hold. Outside 
of B, if convenient, we may assume that y(x) = 1. Thus we do indeed have 
(S ~g) => (g~f). 

Finally, if f(x) = y1(x)g(x) on Bı € B and g(x) = y2(x)h(x) on Bo € B, 
then on an element B € B such that B C B,N Bo, both of these relations hold 
simultaneously, and so f(x) = yı(x)y2(x)h(x) on B. But lim yılz)y(x£) = 


lim y(x). lim yə(x) = 1, and hence we have verified that f 7 h. 
It is useful to note that since the relation lim y(x) = 1 is equivalent to 
y(x) = 1 + a(x), where lim a(x) = 0, the relation f Xg is equivalent to 


f(x) = g(x) + a(x)g(x) = g(x) + o(g(x)) over B. 
We see that the relative error |a(x)| = | Aas) in approximating f(x) 
by a function g(x) that is equivalent to f(x) over B is infinitesimal over B. 
Let us now consider some examples. 


Example 35. £? +x = (1 + t)z? ~ 2? as z oo. 
The absolute value of the difference of these functions 


|(z? + 2) — z°| = |z| 


lel — 


tends to infinity. However, the relative error > -Ł that results from re- 


~ fæl 
placing x? + x by the equivalent function x? tends to zero as £ — oo. 


Example 36. At the beginning of this discussion we spoke of the famous 
asymptotic law of distribution of the prime numbers. We can now give a 
precise statement of this law: 


n(x) = — 


x 
o(—) as zt => +0. 
ng 


lng 


Example 37. Since lim TCU 1, we have sinz ~ x as x —> 0, which can also 
© 


be written as sinz = x + o(x) asx —> 0. 


Example 38. Let us show that ln(1 + x) ~ zx as z > 0. 


Proof. 
In(1 
lim ala = lim In(1 + 2)!/* = ln ( lim (1 + 2)\/*) =Ine=1. 
x—0 £T x—0 x—0 


Here we have used the relation log,(b%) = alog, b in the first equality and 
the relation lim log, t = log, b = log, (lim t) in the second. O 
> > 


Thus, ln(1 + x) = z + o(x) as z —> 0. 
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Example 39. Let us show that e? = 1 + x + o(x) asx > 0. 
Proof. 


e* — 1 j; t 
= WM. ——— = 
z>0 2 t-+0 In(1 +t) 


Here we have made the substitution z = In(1 + t), e” — 1 = t and used the 
relations e” —> e? = 1 as x — 0 and e?” Æ 1 for x Æ 0. Thus, using the 
theorem on the limit of a composite function and the result of the preceding 
example, we have proved the assertion. O 


Thus, eë -l~zasxz—-— 0. 
Example 40. Let us show that (1+ 2)* = 1 + az + o(x) as x > 0. 
Proof. 


` (lt2)*-1 gee | alin). 
lim + = lim ——_____ . —+~—_ = 
x0 x n+ aln(1 + zx) x 
t— In(1 
ee oe Ea = 
t-0 t x—0 £ 


In this computation, assuming a 4 0, we made the substitution aln(1+z) = t 
and used the results of the two preceding examples. 
If œ = 0, the assertion is obvious. O 


Thus, (1 +x) — 1 ~ az as x > 0. 
The following simple fact is sometimes useful in computing limits. 


Proposition 3. If f~f, then lim f(x)g(x) = lim f(z g(x), provided one of 
B B B 


these limits exists. 


Proof. Indeed, given that f(x) = y(x) f(x) and lim y(x) = 1, we have 


lim f(x)g() = lim 7(x)f(2)9(z) = lim y(x) - lim f(x)g(a) = lim f(x)g(x). 0 


Example 41. 
Incosxr 1 Incos*x 1 In(1 — sin? z) 
im — = 5 lim — z =m = 
x—0O sin(x?) 2 x0 r? 2 x0 xe? 
ae sin? x 1 ia x? 1 
xr x? E 22-30 xr? 7 2 


il 
~ 2 
H ve have used the relations ln(1 + aœ) ~ a as a > 0, sinz ~ x as x — 0, 
ae ~G as B + 0, and sin? z ~ x? asx — 0. 
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We have proved that one may replace functions by other functions equiv- 
alent to them in a given base when computing limits of monomials. This rule 
should not be extended to sums and differences of functions. 


Example 42. Vz? +x ~ 2 as £ —> +00, but 


lim (V2?+2-—2)4 lim (c—2)=0. 


x— +o 
In fact, 
lim (Vr? + 2-2) = lim er lim tae: = 1 ; 
t—>+00 t+t+oo yr? +r+r z+ tti 2 
r 


We note one more widely used rule for handling the symbols o(-) and O(-) 
in analysis. 


Proposition 4. For a given base 

a) o(f) + off) = o(f); 

b) o( f) is also O(f); 

c) of f) + O(f) = OCF); 

d) O(f) + O(f) = O(f); p 

e) if g(x) #0, then fe) = o( £2) and Ce) = o(£3). 

Notice some peculiarities of operations with the symbols o(-) and O(-) 
that follow from the meaning of these symbols. For example 20(f) = o(f) 
and o( f)+O(f) = O(f) (even though in general o( f) # 0); also, o( f) = O( f), 
but O(f) 4 o(f). Here the equality sign is used in the sense of “is”. The sym- 
bols o(-) and O(-) do not really denote a function, but rather indicate its 
asymptotic behavior, a behavior that many functions may have simultane- 
ously, for example, f and 2f, and the like. 


Proof. a) After the clarification just given, this assertion ceases to appear 
strange. The first symbol o(f) in it denotes a function of the form a(x) f(x), 
where lim ai(z) = 0. The second symbol o(f), which one can (or should) 


equip with some mark to distinguish it from the first, denotes a function of 
the form a(x) f(x), where lim ao(z) = 0. Then a(x) f(x) + ae(x) f(x) = 
(a(z) + a2(z)) f(e) = as(2)f (2), where lim a5(c) = 0. 

Assertion b) follows from the fact that any function having a limit is 
ultimately bounded. 

Assertion c) follows from b) and d). 

Assertion d) follows from the fact that the sum of ultimately bounded 


functions is ultimately bounded. 


As for e), we have ASD = es = alr Ha = o( £3). 


The second part of assertion e) is verified similarly. O 
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Using these rules and the equivalences obtained in Example 40, we can 
now find the limit in Example 42 by the following direct method: 


: (ee | 1 ; 1 1 
= lim 2(1+>--+0(—)-1) = lim (= +2-0(-)) = 
x—>+00 2. F x z—++too \2 £ 


= lim (5 +0(1)) =5. 


xr—+00 


We shall soon prove the following important relations, which should be 
memorized at this point like the multiplication table: 


1 1 1 
eS Slee e Pe ye e forxeR, 
1 —1)* 
cosz =1- Sa +7 aan + a 4a forxER, 
roe 1 3 Goan 
sing = Te g? aay Te hay +- forxeR, 
1 1 = 
mO +a) =r- atit ) £” +--+ for jz] <1, 
Q a(œa— 1 
(1+x)% =1+ =r + l ) Feet 
1! 2! 
a AE 1 
A ae to ies Vet deca tome, 


On the one hand, these relations can already be used as computational for- 
mulas, and on the other hand they contain the following asymptotic formulas, 
which generalize the formulas contained in Examples 37—40: 


1 1 1 
x _ Par ae ae aoe mn n+1 
e =l+ 72+ 5% ee + O(2"**) asz— 0, 
ie = Pe eae eG) asx—O, 
2! 4l (2k)! 
Shee ee ae a (=1)" eao rad asx—-0O, 
1! 3! (2k + 1)! 
_, l2, l3 (bs n+1 
ln(1 +z) =z 52 + 3 ++ r +O(2""*) asx—>0, 
— 1 
(1+2)* =1+ ga A E 


L + O(x"*") asz—>O0. 


These formulas are usually the most efficient method of finding the limits of 
the elementary functions. When doing so, it is useful to keep in mind that 
O(2™*1) = 2™! .O(11) = 2™ - 2O(1) = 2™0(1) = o(2™) as x > 0. 
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In conclusion, let us consider a few examples showing these formulas in 
action. 


Example 43. 
a 8S e—(x—-972°+O(2*)) 1 o\ 1 
a a ag PO 


Example 44. Let us find 


e+e Itr” 1 1 
= =(1+5)(14+5 
. T 


z| £3’ +r ( 1 a 1 1 A 
— 1 — — = 1 — o — — 
1+2° gor x3 t7 AER r3)’ 
1 1 1 
cos = 1-5 a 


from which we obtain 


7/xe2+2 1 9 1 1 
Ves 8 = = 7g ae + O(S) as rt >. 


Hence the required limit is 


lim 2? (7 +o(5)) E i 


Example 45. 


1 
= lim exp; — = +0(-)} ser, 
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3.2.5 Problems and Exercises 


1. a) Prove that there exists a unique function defined on R and satisfying the 
following conditions: 


fG)=a (a>0,aF1), 
f (v1) > f(2) = f (21 + £2) , 
f(x) > f (zo) as £ > xo . 


b) Prove that there exists a unique function defined on R+ and satisfying the 
following conditions: 


f(a) =1 (a>0,a#1), 
f(z1) + f (z2) = f (21 + 12) , 
f(x) > f(xo) for xo € Ry and Ry Ð £ > 10. 


Hint: Look again at the construction of the exponential function and logarithm 
discussed in Example 10. 


2. a) Establish a one-to-one correspondence y : R — R, such that y(x + y) = 
p(x) - p(y) for any x,y € R, that is, so that the operation of multiplication in the 
image (R+) corresponds to the operation of addition in the pre-image (R). The 
existence of such a mapping means that the groups (R, +) and (R+,-) are identical 
as algebraic objects, or, as we say, they are isomorphic. 


b) Prove that the groups (R,+) and (R \0,-) are not isomorphic. 


3. Find the following limits. 


a) im 2, 
+0 
b) lim gl/®, 


c) lim “salt, 
x—0 


d) lim ==. 


xz—0 i 


4. Show that 


Lt ste ti = Inn + c+ o(1) as n = oo, 


where c is a constant. (The number c = 0.57721... is called Euler’s constant.) 
Hint: One can use the relation 


wo n(1+2)=2+0(=] asn—- co. 
n n n n 


5. Show that 
a) if two series 2 an and È bn with positive terms are such that an ~ bn as 


=1 
n — oo, then the two series dither both converge or both diverge; 


b) the series sin 1 converges only for p > 1. 


n= 
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6. Show that 


oo 
a) if an > an4ı > 0 for all n € N and the series ` a, converges, then 


n=1 


An = o( +) as n — 0O; 


(oe) 
b) if bn = o(2), one can always construct a convergent series È` an such that 
n=1 
bn = 0(an) as n — 00; 


c) if a series 2 an with positive terms converges, then the series = An, where 


=y; 5 ak — | | 5 a, also converges, and an = o(An) as n - 00; 
k=n k=n+1 


d) if a series > an with positive terms diverges, then the series > An, where 


=2 
= 4) 2s ak Ny b> a, also diverges, and An = o(an) as n — oo. 


It follows from c) and d) that no convergent (resp. divergent) series can serve as 
a universal standard of comparison to establish the convergence (resp. divergence) 
of other series. 


7. Show that 


oo . 
a) the series $` ln an, where an > 0, n € N, converges if and only if the sequence 
n=l 
{IIn = a1 ++ -an} has a finite nonzero limit. 


b) the series $` In(1 + an), where |an| < 1, converges absolutely if and only if 


n=l 
(0.0) 
the series $` an converges absolutely. 
n=l 


Hint: See part a) of Exercise 5. 


a | 
8. An infinite product []| ex is said to converge if the sequence of numbers 
k=1 


n Oo 
IT, = || ex has a finite nonzero limit JI. We then set IT = |] ex. 
k=1 k=1 
Show that 


= | 
a) if an infinite product [|] en converges, then en — 1 as n — 00; 
n=l 


; OO 
b) if Vn € N (en > 0), then the infinite product [] en converges if and only if 
= | n=l 
the series $` Ine, converges; 
n=l 


c) if en = 1 + an and the an are all of the same sign, then the infinite product 


TI (1 + an) converges if and only if the series 3 Qn converges. 
n=1 n=1 
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9. a) Find the product J] (1+27"~°). 
n=l 
b) Find [] cos Æ and prove the following theorem of Viète!‘ 
n=l 


T 1 


1 1 1 1 1 1 1 1 1 
2 yataya- ataya taya 


c) Find the function f(x) if 


f0) =1, 
f (2a) = cos* x: f(z), 
f(z) — f(0) asx 0. 


Hint: 7 =2.- = 


10. Show that 


a) if EA = 1 + n, n = 1,2,..., and the series 5° Gn converges absolutely, 
is n=1 


then the limit lim bn = b €E R exists; 
n — o0 


OoOO 
b) if Tar = 1+? +an, n = 1,2,..., and the series $, an converges absolutely, 
ie n=l 
then an ~ <> as n — o0; 


foe) co 
c) if the series ` an is such that a = 1+ 2 + Qan and the series )) an 
n=1 i n=l 


(00) 
converges absolutely, then $} an converges absolutely for p > 1 and diverges for 


n=l 
p < 1 (Gauss’ test for absolute convergence of a series). 


iim (=a) Sis 


n— oo Qn 


11. Show that 


_ for any sequence {an} with positive terms, and that this estimate cannot be im- 
proved. 


14 F, Viète (1540-1603) — French mathematician, one of the creators of modern 
symbolic algebra. 


4 Continuous Functions 


4.1 Basic Definitions and Examples 


4.1.1 Continuity of a Function at a Point 


Let f be a real-valued function defined in a neighborhood of a point a € R. In 
intuitive terms the function f is continuous at a if its value f(x) approaches 
the value f(a) that it assumes at the point a itself as x gets nearer to a. 

We shall now make this description of the concept of continuity of a 
function at a point precise. 


Definition 0. A function f is continuous at the point a if for any neighbor- 
hood V(f(a)) of its value f(a) at a there is a neighborhood U (a) of a whose 
image under the mapping f is contained in V(f(a)). 


We now give the expression of this concept in logical symbolism, along with 
two other versions of it that are frequently used in analysis. 


(f is continuous at a) := (VV (f(a)) 3U (a) (f(U(a)) c V(f(a)))) 
Ve > 0U (a) Vax € U(a) (|f(x) — f(a)| <€), 
Ve > 056 > OVa € R (|x — al <6 => | f(x) — f(a)| <e). 


The equivalence of these statements for real-valued functions follows from 
the fact (already noted several times) that any neighborhood of a point con- 
tains a symmetric neighborhood of the point. 

For example, if for any e-neighborhood V*(f(a)) of f(a) one can choose 
a neighborhood U(a) of a such that Vx € U(a) (|f(x) — f(a)| < £), that 
is, f(U(a)) C V*(f(a)), then for any neighborhood V(f(a)) one can also 
choose a corresponding neighborhood of a. Indeed, it suffices first to take 
an e-neighborhood of f(a) with V°(f(a)) C V(f(a)), and then find U(a) 
corresponding to V°(f(a)). Then f(U(a)) c V€ (f(a)) C V(f(a)). 

Thus, if a function is continuous at a in the sense of the second of these 
definitions, it is also continuous at a in the sense of the original definition. 
The converse is obvious, so that the equivalence of the two statements is 
established. 

We leave the rest of the verification to the reader. 
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To avoid being distracted from the basic concept being defined, that of 
continuity at a point, we assumed for simplicity to begin with that the func- 
tion f was defined in a whole neighborhood of a. We now consider the general 
case. 

Let f : E — R be a real-valued function defined on some set FE C R and 
a a point of the domain of definition of the function. 


Definition 1. A function f : E —> R is continuous at the point a € E if for 
every neighborhood V(f(a)) of the value f(a) that the function assumes at 
a there exists a neighborhood Ug(a) of a in Et whose image f(Up(a)) is 
contained in V(f(a)). 


Thus 


(f : E > R is continuous at a € E) := 


= (WV (f(a)) Uz (a) (f (Uz(a)) C V(f(a)))) - 


Of course, Definition 1 can also be written in the e-d-form discussed above. 
Where numerical estimates are needed, this will be useful, and even necessary. 
We now write these versions of Definition 1. 


(f : E > R is continuous at a € E) := 
= (Ve > 05UpR(a) Vx € Ug(a) (|f(x) — f(a)| <e)) , 


or 


(f: E > R is continuous at a € E) := 
= (Ve > 056 > OVz € E (|x — a| < 6 > | f(x) — f(a)| < €)) , 


We now discuss in detail the concept of continuity of a function at a point. 


1° If a is an isolated point, that is, not a limit point of E, there is a 
neighborhood U (a) of a containing no points of E except a itself. In this case 
Ug(a) = a, and therefore f (Ug(a)) = f(a) C V(f(a)) for any neighborhood 
V (f(a)). Thus a function is obviously continuous at any isolated point of its 
domain of definition. This, however, is a degenerate case. 


2° The substantive part of the concept of continuity thus involves the case 
when a E E and a is a limit point of E. It is clear from Definition 1 that 


(f: BER is sonini ata € E, where a is a limit point of E) = 

e (im f(z) = f(@) . 
Proof. In fact, if a is a limit point of E, then the base E 5 x — a of deleted 
neighborhoods Ug(a) = Ug(a) \ a of a is defined. 


1 We recall that Ug(a) = E N U (a). 
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If f is continuous at a, then, by finding a neighborhood Ug(a) for 
the neighborhood V(f(a)) such that f(Uz(a)) C V(f(a)), we will si- 


multaneously have f (Ux(a)) C V(f(a)). By definition of limit, therefore, 
lim f(z) = f(a). 
Dza 
-© Conversely, if we know that pim f(x) = f(a), then, given a neighbor- 
r>a 


hood V(f(a)), we find a deleted neighborhood Un (a) such that f( Ur (a)) C 
V(f(a)). But since f(a) € V(f(a)), we then have also f(Uz(a)) C V(f(a)). 
By Definition 1 this means that f is continuous at a € E. O 


3° Since the relation pim, Ei (x) = f(a) can be rewritten as 


lim f(x) = ino lim <x), 


E>z—>a E>2-a 


we now arrive at the useful conclusion that the continuous functions (opera- 
tions) and only the continuous ones commute with the operation of passing 
to the limit at a point. This means that the number f(a) obtained by carry- 
ing out the operation f on the number a can be approximated as closely as 
desired by the values obtained by carrying out the operation f on values of 
x that approximate a with suitable accuracy. 


4° If we remark that for a € E the neighborhoods Ug(a) of a form a 
base Ba (whether a is a limit point or an isolated point of Æ), we see that 
Definition 1 of continuity of a function at the point a is the same as the 
definition of the statement that the number f(a) — the value of the function 
at a — is the limit of the function over this base, that is 


(f : E > R is continuous at a € E) & (lim f(z) = f(a)). 


5° We remark, however, that if lim f(x) exists, since a € Ug(a) for every 


neighborhood Ug(a), it follows hat this limit must necessarily be f(a). 
Thus, continuity of a function f : E — R at a point a € E is equivalent to 
the existence of the limit of this function over the base Ba of neighborhoods 
(not deleted neighborhoods) Ug(a) of a € E. 
Thus 


(f : E > R is continuous at a € E) & (3 lim f(x)). 


6° By the Cauchy criterion for the existence of a limit, we can now say that 
a function is continuous at a point a € E if and only if for every € > 0 there 
exists a neighborhood Ug(a) of a in E on which the oscillation w(f;Uz(a)) 
of the function is less than €. 


Definition 2. The quantity w(f;a) = slim» ( f;U}(a)) (where UŽ (a) is the 
—> 
6-neighborhood of a in E) is called the oscillation of f: E > R at a. 
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Formally the symbol w(f;X) has already been taken; it denotes the os- 
cillation of the function on the set X. However, we shall never consider the 
oscillation of a function on a set consisting of a single point (it would obvi- 
ously be zero); therefore the symbol w(f;a), where a is a point, will always 
denote the concept of oscillation at a point just defined in Definition 2. © 

The oscillation of a function on a subset of a set does not exceed its 
oscillation on the set itself, so that w( TU 2 (a)) is a nondecreasing function 
of 6. Since it is nonnegative, either it has a finite limit as ô —> +0, or else 
w(f;U%(a)) = +00 for every 5 > 0. In the latter case we naturally set 
w(f;a) = +00. 

7° Using Definition 2 we can summarize what was said in 6° as follows: a 
function is continuous at a point if and only if its oscillation at that point is 
zero. Let us make this explicit: 


(f : E > R is continuous at a € E) © (w(f;a) =0) . 


Definition 3. A function f : E — R is continuous on the set E if it is 
continuous at each point of E. | 


The set of all continuous real-valued functions defined on a set F will be 
denoted C (E; R) or, more briefly, C (E). 

We have now discussed the concept of continuity of a function. Let us 
consider some examples. 


Example 1. If f : E — R is a constant function, then f € C(E). This is 
obvious, since f(E) = c C V (c), for any neighborhood V (c) of c € R. 
Example 2. The function f(x) = x is continuous on R. Indeed, for any point 
xo E R we have | f(x) — f(xo)| = |x — zo| < € provided |x — zo| < 6 = e. 


Example 3. The function f(x) = sinz is continuous on R. 
In fact, for any point zo € R we have 


L+X% . L—LXo 
| sin x — sin zo| = |2 cos sin < 


_ £— XL — Xo 
< 2] sin | 


5 | = |x — zol LE, 
provided |x — zo| < 6 = e. 

Here we have used the inequality |sinz| < |x| proved in Example 9 of 
Paragraph d) of Subsect. 3.2.2. 


Example 4. The function f(x) = cos x is continuous on R. 
Indeed, as in the preceding example, for any point xo € R we have 


L+% x T= 
——— sin ———— 
2 2 

_ &— 
< 2| sin 


| cos x — cos zo| = | — 2sin |< 


provided |x — zo| < 6 = €. 
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Example 5. The function f(z) = a” is continuous on R. 
Indeed by property 3) of the exponential function (see Par. d in Sub- 
sect. 3.2.2, Example 10a), at any point £o € R we have 


lim a” = a” , 
L—Zo 
which, as we now know, is equivalent to the continuity of the function a” at 


the point Zo. 


Example 6. The function f(x) = log, x is continuous at any point zo in its 
domain of definition R+ = {x € R| z > 0}. : 

In fact, by property 3) of the logarithm (see Par. d in Subsect. 3.2.2, 
Example 10b), at each point zọ E€ R+} we have 


lim log, <x = log, £o , 
Ri 352-20 


which is equivalent to the continuity of the function log, x at the point Zo. 
Now, given £ > 0, let us try to find a neighborhood Up, (xo) of the point 
zo so as to have 
| log, x — log, zo| < € 


at each point x € Ug, (x0). 
This inequality is equivalent to the relations 


x 
—e< log, — <€. 
LO 


For definiteness assume a > 1; then these last relations are equivalent to 
zoa E < T < Za‘. 


The open interval |xzpa~*, xoaf| is the neighborhood of the point xo that 
we are seeking. It is useful to note that this neighborhood depends on both 
€ and the point x9, a phenomenon that did not occur in Examples 1—4. 


Example 7. Any sequence f : N > R is a function that is continuous on the 
set N of natural numbers, since each point of N is isolated. 
4.1.2 Points of Discontinuity 


To improve our mastery of the concept of continuity, we shall explain what 
happens to a function in a neighborhood of a point where it is not continuous. 


Definition 4. If the function f : E — R is not continuous at a point of E, 
this point is called a point of discontinuity or simply a discontinuity of f. 
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By constructing the negation of the statement “the function f : E —> R 
is continuous at the point a € E”, we obtain the following expression of the 
definition of the statement that a is a point of discontinuity of f: 


(a € E is a point of discontinuity of f) := 


= (3V (f(a)) YUz (a) Ix € Uz(a) (f(x) ¢ V(F(a)))) . 


In other words, a € E is a point of discontinuity of the function f : E > R 
if there is a neighborhood V (f (a)) of the value f(a) that the function assumes 
at a such that in any neighborhood Ug(a) of a in E there is a point x whose 
image is not in V(f(a)). 

In e-d-form, this definition has the following appearance: 


Je > 0V6 > Or € E (|x -a| < 5A |f(z) — f(a)| > €). 
Let us consider some examples. 


Example 8. The function f(x) = sgnz is constant and hence continuous in 
the neighborhood of any point a € R that is different from 0. But in any 
neighborhood of 0 its oscillation equals 2. Hence 0 is a point of discontinuity 
for sgn x. We remark that this function has a left-hand limit lim | senz = —1 
r—>— 
and a right-hand limit lim sgn x = 1. However, in the first place, these limits 
r— 

are not the same; and in the second place, neither of them is equal to the 
value of sgnz at the point 0, namely sgn 0 = 0. This is a direct verification 
that 0 is a point of discontinuity for this function. 


Example 9. The function f(x) = |sgnz| has the limit lim lIsgnz| = 1 as 
xr 
x — 0, but f(0) = |sgn0| = 0, so that lim f(x) # f(0), and 0 is therefore a 
r— 
point of discontinuity of the function. 
We remark, however, that in this case, if we were to change the value of 


the function at the point 0 and set it equal to 1 there, we would obtain a 
function that is continuous at 0, that is, we would remove the discontinuity. 


Definition 5. If a point of discontinuity a € E of the function f : E > R 
is such that there exists a continuous function f : E > R such that f | E\a = 
f | E\a’ then a is called a removable discontinuity of the function f. 


Thus a removable discontinuity is characterized by the fact that the limit 
lim f(x) =A exists, but A Æ f(a), and it suffices to set 


Bdx2-a 


g f(x) forze FE ,xrFa, 
f(x) = 


A. {ora = a; 


in order to obtain a function f : E — R that is continuous at a. | 
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Example 10. The function 


sin + ee an 


f(z) = 
0, forxr=0, 


is discontinuous at 0. Moreover, it does not even have a limit as x — 0, 
since, as was shown Example 5 in Subsect. 3.2.1, lim sin + does not exist. 
r— 


The graph of the function sin + is shown in Fig. 4.1. 


1 y 
ae | 
y = Sin > 
0 
2 1 1 2 T 
Tw T T 
—1 
Fig. 4.1. 


Examples 8, 9, and 10 explain the following terminology. 


Definition 6. The point a € E is called a discontinuity of ee kind for the 
function f : E > R if the following limits? exist: 


zoim_, f(x) = fa z 0), dim E) = Fla +0), 
but at least one of them is not equal to the value f(a) that the function 
assumes at a. 


Definition 7. If a € E is a point of discontinuity of the function f : E —> R 
and at least one of the two limits in Definition 6 does not exist, then a is 
called a discontinuity of second kind. 


Thus what is meant is that every point of discontinuity that is not a 
discontinuity of first kind is automatically a discontinuity of second kind. 
Let us present two more classical examples. 


2 If a is a discontinuity, then a must be a limit point of the set E. It may happen, 
however, that all the points of E in some neighborhood of a lie on one side of a. 
In that case, only one of the limits in this definition is considered. 
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Example 11. The function 


1,ifxEeQ, 
D(x) = 
0, ifrEeR\Q, 


is called the Dirichlet function’ 

This function is discontinuous at every point, and obviously all of its dis- 
continuities are of second kind, since in every interval there are both rational 
and irrational numbers. 


Example 12. Consider the Riemann function* 


i, if x = © € Q, where ~ is in lowest terms. 
R(x) = 
0,ifxER\Q. 


We remark that for any point a € R, any bounded neighborhood U(a) of 
it, and any number N € N, the neighborhood U(a) contains only a finite 
number of rational numbers ~, m E Z, n E€ N, withn < N. 

By shrinking the neighborhood, one can then assume that the denomi- 
nators of all rational numbers in the neighborhood (except possibly for the 


ce) 
point a itself if a € Q) are larger than N. Thus at any point x € U(a) we 
have |R(x)| < 1/N. 
We have thereby shown that 

lim R(x) = 0 

r—a 
at any point a € R\Q. Hence the Riemann function is continuous at any irra- 
tional number. At the remaining points, that is, at points x € Q, the function 
is discontinuous, except at the point x = 0, and all of these discontinuities 
are discontinuities of first kind. 


4.2 Properties of Continuous Functions 


4.2.1 Local Properties 


The local properties of functions are those that are determined by the be- 
havior of the function in an arbitrarily small neighborhood of the point in its 
domain of definition. 


3 P, G. Dirichlet (1805-1859) — great German mathematician, an analyst who oc- 
cupied the post of professor ordinarius at Gottingen University after the death 
of Gauss in 1855. 

4 B. F. Riemann (1826-1866) — outstanding German mathematician whose ground- 
breaking works laid the foundations of whole areas of modern geometry and 
analysis. 
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Thus, the local properties themselves characterize the behavior of a func- 
tion in any limiting relation when the argument of the function tends to the 
point in question. For example, the continuity of a function at a point of its 
domain of definition is obviously a local property. 

We shall now exhibit the main local properties of continuous functions. 


Theorem 1. Let f: E — R be a function that is continuous at the point 
a E€ E. Then the following statements hold. 


1° The function f : E > R is bounded in some neighborhood Ug(a) of a. 


2° If f(a) Æ 0, then in some es Ug(a) all the values of the 
function have the same sign as f(a). 


3° If the function g : Ug(a) — R is defined in some neighborhood of a 
and, like f, is continuous at a, then the following functions are defined in 
some neighborhood of a and continuous at a: 


a) (f + 9)(x) := f(x) + g(x), 

b) (F : 9)(2) := f(z) - 9(2), 

c) (£) (2) = ae (provided g(a) # 0). 

4° If the function g : Y — R is continuous at a point b € Y and f is 


such that f: E> Y, f(a) =b, and f is continuous at a, then the composite 
function (go f) is defined on E and continuous at a. 


Proof. To prove this theorem it suffices to recall (see Sect. 4.1) that the 
continuity of the function f or g at a point a of its domain of definition 
is equivalent to the condition that the limit of this function exists over the 
base B, of neighborhoods of a and is equal to the value of the function at a: 
lim f(x) = f(a), lim g(x) = g(a). 

Thus ees 1°, 2°, and 3° of Theorem 1 follow immediately from 
the definition of continuity of a function at a point and the corresponding 
properties of the limit of a function. 

The only explanation required is to verify that the ratio ae is actually 


defined in some neighborhood U g(a) of a. But by hypothesis g(a) Æ 0, and 
by assertion 2° of the theorem there exists a neighborhood U g(a) at every 
point of which g(x) Æ 0, that is, Az) is defined in U (a). 

Assertion 4° of Theorem 1 is a consequence of the theorem on the limit 
of a composite function, by virtue of which 


lim(g o f)(x) = lim g(y) = g(b) = g(f(a)) = (9° f)(a) , 


which is equivalent to the continuity of (go f) at a. 

However, to apply the theorem on the limit of a composite function, we 
must verify that for any element Uy (b) of the base B, there exists an element 
Up(a) of the base Ba such that f(Uz(a)) C Uy (b). But in fact, if Uy (b) = 
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Y NU (b), then by definition of the continuity of f : E — Y at the point a, 
given a neighborhood U (b) = U(f(a)), there is a neighborhood Ug(a) of a in 
E such that f(Uz(a)) C U(f(a)). Since the range of f is contained in Y, we 
have f(Uz(a)) C YNU(f(a)) = Uy (b), and we have justified the application 
of the theorem on the limit of a composite function. O 


Example 1. An algebraic polynomial P(x) = aox” + aiz”! +---+an isa 
continuous function on R. 

Indeed, it follows by induction from 3° of Theorem 1 that the sum and 
product of any finite number of functions that are continuous at a point are 
themselves continuous at that point. We have verified in Examples 1 and 2 of 
Sect. 4.1 that the constant function and the function f(x) = x are continuous 
on R. It then follows that the functions ax” =a-az-...-x are continuous, 


m factors 
and consequently the polynomial P(z) is also. 


Example 2. A rational function R(x) = aes — a quotient of polynomials — is 


continuous wherever it is defined, that is, where Q(x) 4 0. This follows from 
Example 1 and assertion 3° of Theorem 1. 


Example 3. The composition of a finite number of continuous functions is 
continuous at each point of its domain of definition. This follows by induction 
from assertion 4° of Theorem 1. For example, the function esin? (In| cosz|) jg 
continuous on all of R, except at the points $(2k+ 1), k € Z, where it is not 
defined. 


4.2.2 Global Properties of Continuous Functions 


A global property of a function, intuitively speaking, is a property involving 
the entire domain of definition of the function. 


Theorem 2. (The Bolzano—Cauchy intermediate-value theorem). If a func- 
tion that is continuous on a closed interval assumes values with different signs 
at the endpoints of the interval, then there is a point in the interval where it 
assumes the value 0. 


In logical symbols, this theorem has the following expression.° 


(f € Cla,b] A f(a) - f(b) <0) + 3c [a,b] (F(c) =0). 


Proof. Let us divide the interval [a,b] in half. If the function does not assume 
the value 0 at the point of division, then it must assume opposite values at 
the endpoints of one of the two subintervals. In that interval we proceed as we 
did with the original interval, that is, we bisect it and continue the process. 


5 We recall that C(E) denotes the set of all continuous functions on the set E. In 
the case E = [a,b] we often write, more briefly, C[a, b] instead of C (la, b). 
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Then either at some step we hit a point c € [a,b] where f(c) = 0, or 
we obtain a sequence {J,,} of nested closed intervals whose lengths tend to 
zero and at whose endpoints f assumes values with opposite signs. In the 
second case, by the nested interval lemma, there exists a unique point c € 
[a,b] common to all the intervals. By construction there are two sequences 
of endpoints {zj,} and {2n } of the intervals I, such that f(x) < 0 and 
f(x) > 0, while lim D = lim x, = c. By the properties of a limit and 


the definition of connais Ne Ssu find that lim f(x) = f(c) < 0 and 
lim f(xi,) = f(c) > 0. Thus f(c)=0. O 


Remarks to Theorem 2 1° The proof of the theorem provides a very 
simple algorithm for finding a root of the equation f(x) = 0 on an interval at 
whose endpoints a continuous function f(x) has values with opposite signs. 


20 Theorem 2 thus asserts that it is impossible to pass continuously from 
positive to negative values without assuming the value zero along the way. 


3° One should be wary of intuitive remarks like Remark 2°, since they usually 
assume more than they state. Consider, for example, the function equal to 
—1 on the closed interval [0, 1] and equal to 1 on the closed interval [2, 3]. It is 
clear that this function is continuous on its domain of definition and assumes 
values with opposite signs, yet never assumes the value 0. This remark shows 
that the property of a continuous function expressed by Theorem 2 is actually 
the result of a certain property of the domain of definition (which, as will be 
made clear below, is the property of being connected.) 


Corollary to Theorem 2. If the function y is continuous on an open in- 
terval and assumes values y(a) = A and (b) = B at points a and b, then for 
any number C between A and B, there is a point c between a and b at which 


plc) =C 


Proof. The closed interval J with endpoints a and 6 lies inside the open 
interval on which ọ is defined. Therefore the function f(x) = y(x) — C is 
defined and continuous on I. Since f(a)- f(b) = (A—C)(B-—C) < 0, Theorem 
2 implies that there is a point c between a and b at which f(c) = y(c)—C = 0. 
o 


Theorem 3. (The Weierstrass maximum-value theorem). A function that is 
continuous on a closed interval is bounded on that interval. Moreover there 
is a point in the interval where the function assumes its maximum value and 
a point where it assumes its minimal value. 


Proof. Let f : E — R be a continuous function on the closed interval E = 
la, b]. By the local properties of a continuous function (see Theorem 1) for 
any point x € E there exists a neighborhood U(x) such that the function is 
bounded on the set Ug(x) = E N U(x). The set of such neighborhoods U (x) 
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constructed for all x € E forms a covering of the closed interval [a,b] by 
open intervals. By the finite covering lemma, one can extract a finite system 
U(x1),...,U(xn) of open intervals that together cover the closed interval 
[a,b]. Since the function is bounded on each set E N U(ax,) = Ug(axx), that 
is, Mp < f(x) < My, where mz, and My are real numbers and x € Ug(zxx), 
we have 

min{m,...,Mn} < f(z) < max{M),..., Mn} 


at any point x € E = [a,b]. It is now established that f(x) is bounded on 
[a,b]. 
Now let M = sup f(x). Assume that f(x) < M at every point x € E. 
rEeE 
Then the continuous function M — f(x) on E is nowhere zero, although (by the 


definition of M) it assumes values arbitrarily close to 0. It then follows that 
the function M=fis) is, on the one hand, continuous on E because of the local 
properties of continuous functions, but on the other hand not bounded on EF, 
which contradicts what has just been proved about a function continuous on 
a closed interval. 

Thus there must be a point xy € [a,b] at which f(xy) = M. 

Similarly, by considering m = inf f(x) and the auxiliary function EE 


we prove that there exists a point £m € [a,b] at which f(£m) =m. O 


We remark that, for example, the functions fı(x) = x and f(x) = = are 


continuous on the open interval Æ = (0,1), but fı has neither a maximal nor 
a minimal value on E, and f2 is unbounded on E. Thus, the properties of 
a continuous function expressed in Theorem 3 involve some property of the 
domain of definition, namely the property that from every covering of E by 
open intervals one can extract a finite subcovering. From now on we shall call 
such sets compact. | 

Before passing to the next theorem, we give a definition. 


Definition 1. A function f : E —> Ris uniformly continuous on a set E C R 
if for every £ > 0 there exists ô > 0 such that |f (x1)— f (x2)| < £ for all points 
£1, £2 E E such that |x; — xrq| < ô. 


More briefly, 
(f : E > R is uniformly continuous ) := 
= (Ve > 056 > 0Yxı € EYzrə € E (|xı — z2| < ô > 
=> |f(x1) — f(x2)| < €)). 
Let us now discuss the concept of uniform continuity. 


1° If a function is uniformly continuous on a set, it is continuous at each 
point of that set. Indeed, in the definition just given it suffices to set zı = x 
and z2 = a, and we see that the definition of continuity of a function f : 
E — R at a point a € E is satisfied. 
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2° Generally speaking, the continuity of a function does not imply its 
uniform continuity. 


Example 4. The function f(x) = sin L, which we have encountered many 
times, is continuous on the open interval ]0, 1[= E. However, in every neigh- 
borhood of 0 in the set E the function assumes both values —1 and 1. There- 
fore, for € < 2, the condition | f(z) — f(x2)| < € does not hold. 

In this connection it is useful to write out explicitly the negation of the 
property of uniform continuity for a function: 


(f: E > R is not uniformly continuous) := 
= (de > 0V6 > 05a, € E Jrz € E(|x1 — 22|< 5A 


A |f (x1) — f(x2)| 2 €)) . 


This example makes the difference between continuity and uniform conti- 
nuity of a function on a set intuitive. To point out the place in the definition 
of uniform continuity from which this difference proceeds, we give a detailed 
expression of what it means for a function f : E —> R to be continuous on E: 


(f : E > R is continuous on E := 
= (Va € EVe > 0565 > OVa € E (|x — a| < ô = |f(x) — f (a)| < €)) . 


Thus the number ô is chosen knowing the point a € E and the number 
€, and so for a fixed £ the number 6 may vary from one point to another, as 
happens in the case of the function sin + considered in Example 1, or in the 
case of the function log, x or a” studied over their full domain of definition. 

In the case of uniform continuity we are guaranteed the possibility of 
choosing ô knowing only £ > 0 so that |x — a| < 6 implies |f (x) — f(a)| < € 
for all x € E anda E€ E. 


Example 5. If the function f : E — R is unbounded in every neighborhood 
of a fixed point xo € E, then it is not uniformly continuous. 


Indeed, in that case for any 6 > 0 there are points xı and x2 in every 
ĉ neighborhood of xo such that |f(x1) — f(x2)| > 1 although |x, — x9| < ô. 
Such is the situation with the function f(x) = + on the set R \ 0. In this 
case Xp = 0. The same situation holds in regard to log, x, which is defined 
on the set of positive numbers and unbounded in a neighborhood of zp = 0. 


Example 6. The function f(x) = x?, which is continuous on R, is not uni- 
formly continuous on R. 

In fact, at the points xi, = yn + 1 and z? = yn, where n € N, we have 
f(x) =n+1 and f(x;) =n, so that f(x,) — f(x,) =1. But 


1 
myni =) = in 7, 
jim, ( vn) n—> Co JYn+t1+Jn 

so that for any 6 > 0 there are points x}, and x7 such that |x’, — x7| < 6, yet 


f (tn) — Fen) = 1. 
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Example 7. The function f(x) = sin(x?), which is continuous and bounded 
on R, is not uniformly continuous on R. Indeed, at the points x, = y 5 (n + 1) 
and x, = ,/5n, where n € N, we have |f (£n) — f(x;,)| = 1, while lim |x — 
qg” = 
n 
After this discussion of the concept of uniform continuity of a function 
and comparison of continuity and uniform continuity, we can now appreciate 


the following theorem. 


Theorem 4. (The Cantor—Heine theorem on uniform continuity). A func- 
tion that is continuous on a closed interval is uniformly continuous on that 
interval. 


We note that this theorem is usually called Cantor’s theorem in the lit- 
erature. To avoid unconventional terminology we shall preserve this common 
name in subsequent references. 


Proof. Let f : E — R be a given function, E = [a,b], and f € C(E). Since 
f is continuous at every point x € E, it follows (see 6° in Subsect. 4.1.1) 
that, knowing £ > 0 we can find a 6-neighborhood U°(x) of x such that the 
oscillation w(f;U%(zx)) of f on the set U(x) = E N U? (x), consisting of the 
points in the domain of definition E lying in Uf (x), is less than e. For each 
point x € E we construct a neighborhood U°*(z) having this property. The 
quantity ô may vary from one point to another, so that it would be more 
accurate, if more cumbersome, to denote the neighborhood by the symbol 
U(*)(x), but since the whole symbol is determined by the point x, we can 
agree on the following abbreviated notation: U(x) = U®)(x) and V(x) = 
USE) (x), 

The open intervals V(x), x € E, taken together, cover the closed interval 
[a,b], and so by the finite covering lemma one can select a finite covering 
V(x1),.-., V (£n). Let 6 = min {56(a1),..-,56(a@n)}. We shall show that 
| f(a’) — f(x@”)| < £ for any points x’,2” € E such that |x’ — x”| < 6. Indeed, 
since the system of open intervals V (x1), ..., V (£n) covers Æ, there exists an 
interval V(z;) of this system that contains 2’, that is |x’ — x| < $6(2;). But 
in that case 


1 1 1 
|e" — mi| < |æ = x" | + |æ = ai] < 6 + 5il) < 5l) + 3al) = ele) . 
Consequently x’, x” € Ud) (t) = E N US) (x;) and so |f (£) — f(x")| < 
w(f; U2) (a) <E. o 


The examples given above show that Cantor’s theorem makes essential 
use of a certain property of the domain of definition of the function. It is 
clear from the proof that, as in Theorem 3, this property is that from every 
covering of E by neighborhoods of its points one can extract a finite covering. 
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Now that Theorem 4 has been proved, it is useful to return once again 
to the examples studied earlier of functions that are continuous but not uni- 
formly continuous, in order to clarify how it happens that sin(x?), for ex- 
ample, which is uniformly continuous on each closed interval of the real line 
by Cantor’s theorem, is nevertheless not uniformly continuous on R. The 
reason is completely analogous to the reason why a continuous function in 
general fails to be uniformly continuous. This time we invite our readers to 
investigate this question on their own. 

We now pass to the last theorem of this section, the inverse function 
theorem. We need to determine the conditions under which a real-valued 
function on a closed interval has an inverse and the conditions under which 
the inverse is continuous. 


Proposition 1. A continuous mapping f : E — R of a closed interval E = 
[a,b] into R is injective if and only if the function f is strictly monotonic on 


[a,b]. 


Proof. If f is increasing or decreasing on any set & C R whatsoever, the 
mapping f : E — R is obviously injective: at different points of E the function 
assumes different values. 

Thus the more substantive part of Proposition 1 consists of the assertion 
that every continuous injective mapping f : [a,b] — R is realized by a strictly 
monotonic function. 

Assuming that such is not the case, we find three points 1, < £2 < £3 
in [a,b] such that f(x2) does not lie between f(xı) and f(x3). In that case, 
either f(x3) lies between f(x) and f(x2) or f(xı) lies between f(x2) and 
f(x3). For definiteness assume that the latter is the case. By hypothesis f is 
continuous on [x2, x3]. Therefore, by Theorem 2, there is a point xj in this 
interval such that f(x) = f(x1). We then have zı < 2x}, but f(x) = f(z‘), 
which is inconsistent with the injectivity of the mapping. The case when 
f (x3) lies between f(x) and f(x) is handled similarly. O 


Proposition 2. Each strictly monotonic function f : X — R defined on 
a numerical set X C R has an inverse f7! : Y — R defined on the set 
Y = f(X) of values of f, and has the same kind of monotonicity on Y that 
f has on X. 


Proof. The mapping f : X — Y = f(X) is surjective, that is, it is a mapping 
of X onto Y. For definiteness assume that f : X — Y is increasing on X. In 


that case 
Va, E€ X Yra EX (ay < T2 © f(x) < f (x2)) ‘ (4.1) 


Thus the mapping f : X — Y assumes different values at different points, 
and so is injective. Consequently f : X — Y is bijective, that is, it is a 
one-to-one correspondence between X and Y. Therefore the inverse mapping 
f-t: Y — X is defined by the formula z = f~1(y) when y = f(z). 
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Comparing the definition of the mapping f~! : Y — X with relation 
(4.1), we arrive at the relation 


Vy € Y Wyo E Y (f(y) < FT (y2) © yı < y2) (4.2) 


which means that the function f~! is also increasing on its domain of defini- 
tion. 

The case when f : X — Y is decreasing on X is obviously handled 
similarly. O 


In accordance with Proposition 2 just proved, if we are interested in the 
continuity of the function inverse to a real-valued function, it is useful to 
investigate the continuity of monotonic functions. 


Proposition 3. The discontinuities of a function f : E —> R that is mono- 
tonic on the set E C R can be only discontinuities of first kind. 


Proof. For definiteness let f be nondecreasing. Assume that a € E is a point 
of discontinuity of f. Since a cannot be an isolated point of E, a must be 
a limit point of at least one of the two sets ET = {x € E|a < a} and 
E+ = {x € E|x > a}. Since f is nondecreasing, for any point x € EZ we 
have f(x) < f(a), and the restriction f| p- Of f to Ez is a nondecreasing 
function that is bounded from above. It then follows that the limit 


üm (flez)(@) = ,glim_, f(@) = fa-0) 


Eg >x—-a 


exists. 
The proof that the limit _ lim f(x) = f(a+0) exists when a is a limit 
E>xz—a+0 


point of Et is analogous. 

The case when f is a nonincreasing function can be handled either by 
repeating the reasoning just given or passing to the function — f, so as to 
reduce the question to the case already considered. O 


Corollary 1. Ifa is a point of discontinuity of a monotonic function f : 
E — R, then at least one of the limits 


lim f(x) = f(a > 0), 


E>x—-a—0 fa ae f(z) E fla y 0) 


exists, and strict inequality holds in at least one of the inequalities f(a — 0) < 
f(a) < f(a+0) when f is nondecreasing and f(a — 0) > f(a) > fla+0) 
when f is nonincreasing. The function assumes no values in the open interval 
defined by the strict inequality. Open intervals of this kind determined by 
different points of discontinuity have no points in common. 


Proof. Indeed, if a is a point of discontinuity, it must be a limit point of 
the set E, and by Proposition 3 is a discontinuity of first kind. Thus at 
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least one of the bases E 3 x > a—O and E 3 x — a+ 0 is defined, 
and the limit of the function over that base exists. (When both bases are 
defined, the limits over both bases exist.) For definiteness assume that f 
is nondecreasing. Since a is a point of discontinuity, strict inequality must 
actually hold in at least one of the inequalities f(a — 0) < f(a) < f(a+0). 
Since f(x) < nol o f(x) = f(a—0), if € E and z < a, the open interval 


(f(a — 0), f(a)) defined by the strict inequality f(a — 0) < f(a) is indeed 
devoid of values of the function. Analogously, since f(a +0) < f(x) ifae E 
and a < x, the open interval (f(a), f(a +0)) defined by the strict inequality 
f(a) < f(a +0) contains no values of f. 

Let a; and ag be two different points of discontinuity of f, and assume 
a, < ag. Then, since the function is nondecreasing, 


flai — 0) < f(a1) < f(a +0) < f(a2 — 0) < Flaz) < f(az2 +0). 


It follows from this that the intervals containing no values of f and corre- 
sponding to different points of discontinuity are disjoint. O 


Corollary 2. The set of points of discontinuity of a monotonic function is 
at most countable. 


Proof. With each point of discontinuity of a monotonic function we associate 
the corresponding open interval in Corollary 1 containing no values of f. 
These intervals are pairwise disjoint. But on the line there cannot be more 
than a countable number of pairwise disjoint open intervals. In fact, one can 
choose a rational number in each of these intervals, so that the collection of 
intervals is equipollent with a subset of the set Q of rational numbers. Hence 
it is at most countable. Therefore, the set of points of discontinuity, which 
is in one-to-one correspondence with a set of such intervals, is also at most 
countable. O 


Proposition 4. (A criterion for continuity of a monotonic function.) A 
= monotonic function f : E —> R defined on a closed interval E = [a,b] is 
continuous if and only if its set of values f(E) is the closed interval with 
endpoints f(a) and f(b). 


Proof. If f is a continuous monotonic function, the monotonicity implies that 
all the values that f assumes on the closed interval [a;b] lie between the values 
f(a) and f(b) that it assumes at the endpoints. By continuity, the function 
must assume all the values intermediate between f(a) and f(b). Hence the set 
of values of a function that is monotonic and continuous on a closed interval 
[a,b] is indeed the closed interval with endpoints f(a) and f(b). 

Let us now prove the converse. Let f be monotonic on the closed interval 
[a,b]. If f has a discontinuity at some point c € [a,b], by Corollary 1 one of the 


ê Here f(a) < f(b) if f is nondecreasing, and f(b) < f(a) if f is nonincreasing. 
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open intervals | f(c—0), f(c)[ and ] f(c), f(c +0] is defined and nonempty and 
contains no values of f. But, since f is monotonic, that interval is contained in 
the interval with endpoints f(a) and f(b). Hence if a monotonic function has 
a point of discontinuity on the closed interval [a, b], then the closed interval 
with endpoints f(a) and f(b) cannot be contained in the range of values of 
the function. O 


Theorem 5. (The inverse function theorem). A function f : X —> R that is 
strictly monotonic on a set X C R has an inverse f7! : Y — R defined on 
the set Y = f(X) of values of f. The function f7! : Y > R is monotonic 
and has the same type of monotonicity on Y that f has on X. 

If in addition X is a closed interval |a,b] and f is continuous on X, then 
the set Y = f(X) is the closed interval with endpoints f(a) and f(b) and the 
function f7! : Y — R is continuous on it. 


Proof. The assertion that the set Y = f(X) is the closed interval with end- 
points f(a) and f(b) when X = [a,b] and f is continuous follows from 
Proposition 4 proved above. It remains to be verified that f-! : Y > R 
is continuous. But fT! is monotonic on Y, Y is a closed interval, and 
f-\(Y) = X = [a,b] is also a closed interval. We conclude by Proposition 4 
that fT! is continuous on the interval Y with endpoints f(a) and f(b). O 


Example 8. The unction y = f(x) = sing is increasing and continuous on 
the closed interval [—3 53 z], Hence the restriction of the function to the closed 
interval [ S z] has an inverse x = fT!(y), which we denote x = arcsin y; 
this function is defined on the closed P [sin (— 3),sin (4)| = [-1,1], 


increases from —35 to 4, and is continuous on this closed interval. 


Example 9. Similarly, the restriction of the function y = cos æ to the closed 
interval [0,7] is a decreasing continuous function, which by Theorem 5 has 
an inverse denoted x = arccos y, defined on the closed interval [—1, 1] and 
decreasing from 7 to 0 on that interval. 


Example 0. The restriction of the function y = tanz to the open interval 
X = | —- 4, 3| is a continuous function that increases from —oo to +00. By 
the first part of Theorem 5 it has an inverse denoted x = arctan y, defined 
for all y € R, and increasing within the open interval | = z] of its values. 
To prove that the function x = arctan y is continuous at each point yo of its 
domain of definition, we take the point x9 = arctan yo and a closed aye) 
[£o — £, £o + €] containing zo and contained in the open interval |- Bo zI. If 

Lo — € = arctan(yo — ĝ1) and xp + € = arctan(yo + 62), then for every y € R 
such that yo—ô1 < y < Yo +ôz we shall have zg —e < arctany < zo +e. Hence 
| arctan y — arctan yo| < € for —ô1 < y— Yo < 62. The former inequality holds 
in particular if |y — yo| < 6 = min{ô1, 62}, which verifies that the function 
x = arctany is continuous at the point yo € R. 
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Example 11. By reasoning analogous to that of the preceding example, we 
establish that since the restriction of the function y = cotx to the open 
interval ]0, 7[ is a continuous function that decreases from +00 to —oo, it has 
an inverse denoted x = arccot y, defined, continuous, and decreasing on the 
entire real line R from 7 to 0 and assuming values in the range ]0, r|. 


Remark. In constructing the graphs of mutually inverse functions y = f(z) 
and x = f—1(y) it is useful to keep in mind that in a given coordinate system 
the points with coordinates (x, f(x)) = (x,y) and (y, f~'(y)) = (y, £) are 
symmetric with respect to the bisector of the angle in the first quadrant. 

Thus the graphs of mutually inverse functions, when drawn in the same 
coordinate system, are symmetric with respect to this angle bisector. 


4.2.3 Problems and Exercises 


1. Show that 
a) if f € C(A) and B C A, then f „£ C(B); 


b) if a function f : Eı U E2 —> R is such that f 
always the case that f € C(Fi U E2). 


z € C(E;), i = 1,2, it is not 


c) the Riemann function R, and its restriction R| to the set of rational numbers 


are both discontinuous at each point of Q except 0, and all the points of discontinuity 
are removable (see Example 12 of Sect. 4.1). 


2. Show that for a function f € Cla, b] the functions 
m(z)= min f(t) and M(x) = max f(t) 
are also continuous on the closed interval [a,b]. 


3. a) Prove that the function inverse to a function that is monotonic on an open 
_ interval is continuous on its domain of definition. 


b) Construct a monotonic function with a countable set of discontinuities. 


c) Show that if functions f : X => Y and f~! : Y — X are mutually inverse 
(here X and Y are subsets of R), and f is continuous at a point ro € X, the 
function f~* need not be continuous at yo = f(xo) in Y. 


4. Show that 

a) if f € Cla, b] and g € C[a, bj, and, in addition, f(a) < g(a) and f(b) > g(b), 
then there exists a point c € [a,b] at which f(c) = g(c); 

b) any continuous mapping f : [0,1] — [0,1] of a closed interval into itself has 
a fixed point, that is, a point x € [0,1] such that f(x) = zx; 

c) if two continuous mappings f and g of an interval into itself commute, that 
is, fog = go f, then they have a common fixed point; 


d) a continuous mapping f : R > R may fail to have a fixed point; 
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e) a continuous mapping f :]0, 1[—]0, 1[ may fail to have a fixed point; 
f) if a mapping f : [0,1] — [0,1] is continuous, f(0) = 0, f(1) = 1, and 
(fo f)(x) =x on [0,1], then f(z) = zx. 


5. Show that the set of values of any function that is continuous on a closed interval 
is a closed interval. 


6. Prove the following statements. 

a) If a mapping f : [0,1] — [0,1] is continuous, f(0) = 0, f(1) = 1, and 
f"(x) := fo...o f(x)=x on [0,1], then f(x) = zx. 

—— ee 
n factors 

b) If a function f : [0,1] — [0,1] is continuous and nondecreasing, then for any 
point x € [0,1] at least one of the following situations must occur: either z is a 
fixed point, or f(x) tends to a fixed point. (Here f"(x) = fo...o f(z) is the nth 
iteration of f.) 


7. Let f : [0,1] — R be a continuous function such that f(0) = f(1). Show that 
a) for any n € N there exists a horizontal closed interval of length + with 
endpoints on the graph of this function; 


b) if the number / is not of the form + there exists a function of this form on 
whose graph one cannot inscribe a horizontal chord of length l. 


8. The modulus of continuity of a function f : E — R is the function w(ô) defined 
for 6 > 0 as follows: 
w(6)= sup |f(xi) — f(x2)|. 


|jxx—2Q|<d 
T1, T2€E 


Thus, the least upper bound is taken over all pairs of points xı, x2 of E whose 
distance apart is less than 0. 
Show that 


a) the modulus of continuity is a nondecreasing nonnegative function having 
the limit’ w(+0) = lim w(ô); 
6—+0 
b) for every £ > 0 there exists ô > 0 such that for any points 71,22 E€ E the 
relation |x1 — r2| < 6 implies |f (x1) — f(x2)| < w(+0) + £; 


c) if E is a closed interval, an open interval, or a half-open interval, the relation 
w(d1 + 62) < w(d1) + w(d2) 


holds for the modulus of continuity of a function f : E > R; 


d) the moduli of continuity of the functions x and sin(x?) on the whole real axis 
are respectively w(ô) = 6 and the constant w(ô) = 2 in the domain 6 > 0; 


e) a function f is uniformly continuous on E if and only if w(+0) = 0. 


T For this reason the modulus of continuity is usually considered for ô > 0, setting 
w(0) = w(+0). 
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9. Let f and g be bounded functions defined on the same set X. The quantity 
A = sup |f(x) — g(x)| is called the distance between f and g. It shows how well 
rEx i 


one function approximates the other on the given set X. Let X be a closed interval 
[a,b]. Show that if f,g € C[a,b], then 3xo € [a,b], where A = |f (xo) — g(xo)|, and 
that such is not the case in general for arbitrary bounded functions. 


10. Let Pa (x) be a polynomial of degree n. We are going to approximate a bounded 
function f : [a,b] + R by polynomials. Let 


A(Pr) = up Ee) Fale) and En(f) = inf A(Pa) , 


where the infimum is taken over all polynomials of degree n. A polynomial P, is 
called a polynomial of best approximation of f if A(P,) = E,(f). 
Show that 


a) there exists a polynomial Po(x) = ao of best approximation of degree zero; 


b) among the polynomials Q(x) of the form AP,,(x), where Pn is a fixed poly- 
nomial, there is a polynomial Qa, such that 


A(Qro) = min A(Qa) ; 


c) if there exists a polynomial of best approximation of degree n, there also 
exists a polynomial of best approximation of degree n + 1; 


d) for any bounded function on a closed interval and any n = 0,1,2,... there 
exists a polynomial of best approximation of degree n. 


11. Prove the following statements. 
a) A polynomial of odd degree with real coefficients has at least one real root. 
b) If Pan is a polynomial of degree n, the function sgn P,,(x) has at most n points 
of discontinuity. 


c) If there are n + 2 points £o < £1 < +> < %n41 in the closed interval [a,b] 
such that the quantity 


sgn Ve — Pa(#i))(-1)'] 
assumes the same value for i = 0,...,n+1, then En(f) > ae |f (ai) — Pn(axi)]. 


(This result is known as Vallée Poussin’s theorem.® For the definition of En(f) see 
Problem 10.) 


12. a) Show that for any n € N the function T,(z) = cos(n arccos x) defined on 
the closed interval [—1,1] is an algebraic polynomial of degree n. (These are the 
Chebyshev polynomials.) 

b) Find an explicit algebraic expression for the polynomials Tı, T2, T3, and T4 
and draw their graphs. 


8 Ch. J.de la Vallée Poussin (1866-1962) — Belgian mathematician and specialist 
in theoretical mechanics. 
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c) Find the roots of the polynomial Tn (x) on the closed interval [—1, 1] and the 
points of the interval where |T;,(x)| assumes its maximum value. 


d) Show that among all polynomials P„ (x) of degree n whose leading coefficient 
is 1 the polynomial Tn (x) is the unique polynomial closest to zero, that is, En (0) = 
max |Tn(x)|. (For the definition of En(f) see Problem 10.) 


13. Let f € Cla, b]. 
a) Show that if the polynomial Pa(x) of degree n is such that there are n + 2 


points £o < +--+ < Zn4i (called Chebyshev alternant points) for which f(xi) — 
Pa (xi) = (—1)*A(P,.) - a, where A(P,,) = sag |f(z) — Pa(x)| and a is a constant 
xrEla, 


equal to 1 or —1, then P,(x) is the unique polynomial of best approximation of 
degree n to f (see Problem 10). 


b) Prove Chebyshev’s theorem: A polynomial Pa(x) of degree n is a polynomial 
of best approximation to the function f E€ Cla,b] if and only if there are at least 
n +2 Chebyshev alternant points on the closed interval |a, b]. 


c) Show that for discontinuous functions the preceding statement is in general 
not true. 


d) Find the polynomials of best approximation of degrees zero and one for the 
function |x| on the interval [—1, 2]. 


14. In Sect. 4.2 we discussed the local properties of continuous functions. The 
present problem makes the concept of a local property more precise. 

Two functions f and g are considered equivalent if there is a neighborhood U (a) 
of a given point a € R such that f(x) = g(x) for all x € U(a). This relation between 
functions is obviously reflexive, symmetric, and transitive, that is, it really is an 
equivalence relation. 

A class of functions that are all equivalent to one another at a point a is called 
a germ of functions at a. If we consider only continuous functions, we speak of a 
germ of continuous functions at a. 

The local properties of functions are properties of the germs of functions. 


a) Define the arithmetic operations on germs of numerical-valued functions 
defined at a given point. 


b) Show that the arithmetic operations on germs of continuous functions do not 
lead outside this class of germs. 


c) Taking account of a) and b), show that the germs of continuous functions 
form a ring — the ring of germs of continuous functions. 


d) A subring I of a ring K is called an ideal of K if the product of every element 
of the ring K with an element of the subring J belongs to J. Find an ideal in the 
ring of germs of continuous functions at a. 


15. An ideal in a ring is mazimal if it is not contained in any larger ideal except 
the ring itself. The set Cla, b] of functions continuous on a closed interval forms a 
ring under the usual operations of addition and multiplication of numerical-valued 
functions. Find the maximal ideals of this ring. 


5 Differential Calculus 


5.1 Differentiable Functions 


5.1.1 Statement of the Problem and Introductory Considerations 


Suppose, following Newton,! we wish to solve the Kepler problem? of two 
bodies, that is, we wish to explain the law of motion of one celestial body m (a 
planet) relative to another body M (a star). We take a Cartesian coordinate 
system in the plane of motion with origin at M (Fig. 5.1). Then the position 
of m at time t can be characterized numerically by the coordinates (x(t), y(t)) 
of the point in that coordinate system. We wish to find the functions x(t) 
and y(t). 


M 


Fig. 5.1. 


The motion of m relative to M is governed by Newton’s two famous laws: 


the general law of motion 
ma=F, (5.1) 


! I, Newton (1642-1727) — British physicist, astronomer, and mathematician, an 
outstanding scholar, who stated the basic laws of classical mechanics, discov- 
ered the law of universal gravitation, and developed (along with Leibniz) the 
foundations of differential and integral calculus. He was appreciated even by 
his contemporaries, who inscribed on his tombstone: “Hic depositum est, quod 
mortale fuit Isaaci Newtoni” (Here lies what was mortal of Isaac Newton). 

2 J. Kepler (1571-1630) — famous German astronomer who discovered the laws of 
motion of the planets (Kepler’s laws). 
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connecting the force vector with the acceleration vector that it produces via 
the coefficient of proportionality m — the inertial mass of the body,? and 


the law of universal gravitation, which makes it possible to find the gravi- 
tational action of the bodies m and M on each other according to the formula 


mM 


where r is a vector with its initial point in the body to which the force is 
applied and its terminal point in the other body and |r| is the length of the 
vector r, that is, the distance between m and M. 

Knowing the masses m and M, we can easily use Eq. (5.2) to express 
the right-hand side of Eq. (5.1) in terms of the coordinates x(t) and y(t) of 
the body m at time t, and thereby take account of all the data for the given 
motion. 

To obtain the relations on x(t) and y(t) contained in Eq. (5.1), we must 
learn how to express the left-hand side of Eq. (5.1) in terms of x(t) and y(t). 

Acceleration is a characteristic of a change in velocity v(t). More precisely, 
it is simply the rate at which the velocity changes. Therefore, to solve the 
problem we must first of all learn how to compute the velocity v(t) at time 
t possessed by a body whose motion is described by the radius-vector r(t) = 
(x(t), y(t)). 

Thus we wish to define and learn how to compute the instantaneous ve- 
locity of a body that is implicit in the law of motion (5.1). 

To measure a thing is to compare it to a standard. In the present case, 
what can serve as a standard for determining the instantaneous velocity of 
motion? | 

The simplest kind of motion is that of a free body moving under iner- 
tia. This is a motion under which equal displacements of the body in space 
(as vectors) occur in equal intervals of time. It is the so-called uniform (rec- 
tilinear) motion. If a point is moving uniformly, and r(0) and r(1) are its 
radius-vectors relative to an inertial coordinate system at times t = 0 and 
t = 1 respectively, then at any time t we shall have 


r(t)—r(0)=v-t, (5.3) 


where v = r(1) — r(0). Thus the displacement r(t) — r(0) turns out to be a 
linear function of time in this simplest case, where the role of the constant 
of proportionality between the displacement r(t) — r(0) and the time t is 
played by the vector v that is the displacement in unit time. It is this vector 
that we call the velocity of uniform motion. The fact that the motion is 
rectilinear can be seen from the parametric representation of the trajectory: 


3 We have denoted the mass by the same symbol we used for the body itself, 
but this will not lead to any confusion. We remark also that if m < M, the 
coordinate system chosen can be considered inertial. 
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r(t) = r(0) + v - t, which is the equation of a straight line, as you will recall 
from analytic geometry. 

We thus know the velocity v of uniform rectilinear motion given by Eq. 
(5.3). By the law of inertia, if no external forces are acting on a body, it 
moves uniformly in a straight line. Hence if the action of M on m were to 
cease at time t, the latter would continue its motion, in a straight line at a 
certain velocity from that time on. It is natural to regard that velocity as the 
instantaneous velocity of the body at time t. 

However, such a definition of instantaneous velocity would remain a pure 
abstraction, giving us no guidance for explicit computation of the quantity, if 
not for the circumstance of primary importance that we are about to discuss. 

While remaining within the circle we have entered (logicians would call 
it a “vicious” circle) when we wrote down the equation of motion (5.1) and 
then undertook to determine what is meant by instantaneous velocity and 
acceleration, we nevertheless remark that, even with the most general ideas 
about these concepts, one can draw the following heuristic conclusions from 
Eq. (5.1). If there is no force, that is, F = 0, then the acceleration is also 
zero. But if the rate of change a(t) of the velocity v(t) is zero, then the 
velocity v(t) itself must not vary over time. In that way, we arrive at the law 
of inertia, according to which the body indeed moves in space with a velocity 
that is constant in time. 

From this same Equation (5.1) we can see that forces of bounded magni- 
tude are capable of creating only accelerations of bounded magnitude. But if 
the absolute magnitude of the rate of change of a quantity P(t) over a time 
interval [0, t| does not exceed some constant c, then, in our picture of the sit- 
uation, the change |P(t) — P(0)| in the quantity P over time t cannot exceed 
c-t, that is, in this situation, the quantity changes by very little in a small 
interval of time. (In any case, the function P(t) turns out to be continuous.) 
Thus, in a real mechanical system the parameters change by small amounts 
over a small time interval. 

In particular, at all times t close to some time to the velocity v(t) of the 
_ body m must be close to the value v(to) that we wish to determine. But in 
that case, in a small neighborhood of the time to the motion itself must differ 
by only a small amount from uniform motion at velocity v(to), and the closer 
to to, the less it differs. 

If we photographed the trajectory of the body m through a telescope, 
depending on the power of the telescope, we would see approximately what 
is shown in Fig. 5.2. | 

The portion of the trajectory shown in Fig. 5.2c corresponds to a time 
interval so small that it is difficult to distinguish the actual trajectory from 
a straight line, since this portion of the trajectory really does resemble a 
straight line, and the motion resembles uniform rectilinear motion. From this 
observation, as it happens, we can conclude that by solving the problem of 
determining the instantaneous velocity (velocity being a vector quantity) we © 
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Fig. 5.2. 


will at the same time solve the purely geometric problem of defining and 
finding the tangent to a curve (in the present case the curve is the trajectory 
of motion). 

Thus we have observed that in this problem we must have v(t) ~ v(to) 
for t close to to, that is, v(t) > v(to) as t — to, or, what is the same, 
v(t) = v(to) + o(1) as t > to. Then we must also have 


r(t) — r(to) = v(to) - (t — to) 


for t close to to. More precisely, the value of the displacement r(t) — r(to) is 
equivalent to v(to)(t — to) as t > to, or 


r(t) — r(to) = v(to)(t — to) + o(v(to)(t = to)) ; (5.4) 


where o(v(to)(t — to)) is a correction vector whose magnitude tends to zero 
faster than the magnitude of the vector v(to)(t—to) as t > to. Here, naturally, 
we must except the case when v(tọ) = 0. So as not to exclude this case 
from consideration in general, it is useful to observe that* |v(to)(t — to)| = 
|v(to)| |t — to|. Thus, if |v(to)| # 0, then the quantity |v(to)(t — to)| is of 
the same order as |t — to|, and therefore o(v(to)(t — to)) = o(t — to). Hence, 
instead of (5.4) we can write the relation 


r(t) = r(to) = v(to)(t = to) ae o(t = to) ; (5.5) 


which does not exclude the case v(to) = 0. 

Thus, starting from the most general, and perhaps vague ideas about 
velocity, we have arrived at Eq. (5.5), which the velocity must satisfy. But 
the quantity v(to) can be found unambiguously from Eq. (5.5): 


_ r(t) — r(to) 
v(to) = pm a ee (5.6) 
* Here |t — to| is the absolute value of the number t — to, while |v| is the absolute 
value, or length of the vector v. 
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Therefore both the fundamental relation (5.5) and the relation (5.6) equiv- 
alent to it can now be taken as the definition of the quantity v(t), the 
instantaneous velocity of the body at time to. 

At this point we shall not allow ourselves to be distracted into a detailed 
discussion of the problem of the limit of a vector-valued function. Instead, 
we shall confine ourselves to reducing it to the case of the limit of a real- 
valued function, which has already been discussed in complete detail. Since 
the vector r(t) — r(to) has coordinates (x(t) — xr(to), y(t) — y(to)), we have 
rørt) = = (ae) siulo and hence, if we regard vectors as being 
glose together if their coordinates are close together, the limit in (5.6) should 
be interpreted as follows: 


v(to) = pm ae — lt ~ (jim ee — to, pi me) 


and the term o(t — to) in (5.5) should be interpreted as a vector depending 
on ¢ such that the vector olto) tends (coordinatewise) to zero as t + to. 
Finally, we remark that if v(to) Æ 0, then the equation 


r— r(to) = v(to) : (t m to) (5.7) 


defines a line, which by the circumstances indicated above should be regarded 
as the tangent to the trajectory at the point (x(to), y(to)). 

Thus, the standard for defining the velocity of a motion is the velocity of 
uniform rectilinear motion defined by the linear relation (5.7). The standard 
motion (5.7) is connected with the motion being studied as shown by relation 
(5.5). The value v(to) at which (5.5) holds can be found by passing to the 
limit in (5.6) and is called the velocity of motion at time to. The motions 
studied in classical mechanics, which are described by the law (5.1), must 
admit comparison with this standard, that is, they must admit of the linear 
approximation indicated in (5.5). | 

If r(t) = (x(t), y(t)) is the radius-vector of a moving point m at time t, 
then r(t) = (t(t), y(t)) = v(t) is the vector that gives the rate of change of 
r(t) at time t, and #(t) = (Z(t), y(t)) = a(t) is the vector that gives the rate 
of change of v(t) (acceleration) at time t, then Eq. (5.1) can be written in 
the form 


m-¥(t) = F(t) , 
from which we obtain in coordinate form for motion in a gravitational field 
x(t) 


PD = OM RO + POR 
y(t) 
“TE FUADA 


This is a precise mathematical expression of our original problem. Since 
we know how to find r(t) from r(t) and then how to find ¥(t), we are already 


(5.8) 
y(t) = — 
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in a position to answer the question whether a pair of functions (z(t), y(t)) 
can describe the motion of the body m about the body M. To answer this 
question, one must find #(t) and #(t) and check whether Eqs. (5.8) hold. The 
system (5.8) is an example of a system of so-called differential equations. At 
this point we can only check whether a set of functions is a solution of the 
system. How to find the solution or, better expressed, how to investigate the 
properties of solutions of differential equations, is studied in a special and, as 
one can now appreciate, critical area of analysis — the theory of differential 
equations. 

The operation of finding the rate of change of a vector quantity, as has 
been shown, reduces to finding the rates of change of several numerical-valued 
functions — the coordinates of the vector. Thus we must first of all learn how 
to carry out this operation easily in the simplest case of real-valued functions 
of a real-valued argument, which we now take up. 


5.1.2 Functions Differentiable at a Point 


We begin with two preliminary definitions that we shall shortly make precise. 


Definition 01. A function f : E — R defined on a set E C R is differentiable 
at a point a € E that is a limit point of E if there exists a linear function 
A - (x — a) of the increment x — a of the argument such that f(x) — f(a) can 
be represented as 


f(z) — f(a) = A- (x—a)+o(xz-—a)asztr >a, TEE. (5.9) 


In other words, a function is differentiable at a point a if the change in its 
values in a neighborhood of the point in question is linear up to a correction 
that is infinitesimal compared with the magnitude of the displacement x — a 
from the point a. 


Remark. As a rule we have to deal with functions defined in an entire neigh- 
borhood of the point in question, not merely on a subset of the neighborhood. 


Definition 02. The linear function A - (x — a) in Eq. (5.9) is called the 
differential of the function f at a. 


The differential of a function at a point is uniquely determined; for it 
follows from (5.9) that 


lim f(z) - fla) _ lim aa ai, 
E5xz>a r—a E5z2—-a L—a 


so that the number A is unambiguously determined due to the uniqueness of 
the limit. 
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Definition 1. The number 


fia = jim Oo (6.10) 


E232—- a A eee 6 | 
is called the derivative of the function f at a. 


Relation (5.10) can be rewritten in the equivalent form 


f) — fla) _ 


L—a 


f'(a) + a(z) , 
where a(x) — 0 as x > a, x € E, which in turn is equivalent to 
f(x) — f(a) = f'(a)\(x@—a)+0(x-a)asxtoa,reE. (5.11) 


Thus, differentiability of a function at a point is equivalent to the existence 
of its derivative at the same point. 

If we compare these definitions with what was said in Subsect. 5.1.1, we 
can conclude that the derivative characterizes the rate of change of a function 
at the point under consideration, while the differential provides the best linear 
approximation to the increment of the function in a neighborhood of the same 
point. 

If a function f : E — R is differentiable at different points of the set 
E, then in passing from one point to another both the quantity A and the 
function o(x — a) in Eq. (5.9) may change (a result at which we have already 
arrived explicitly in (5.11)). This circumstance should be noted in the very 
definition of a differentiable function, and we now write out this fundamental 
definition in full. 


Definition 2. A function f : E — R defined on a set E C R is differentiable 
at a point x € E that is a limit point of E if 


f(a +h) — f(x) = A(x)h + a(a;h) , (5.12) 


where h +» A(x)h is a linear function in h and a(z;h) = o(h) as h > 0, 
r+he E. 


The quantities 
Ax(h) :=(a#+h)-xr=h 


and 
Af(z;h) := f(a +h) — f(x) 

are called respectively the increment of the argument and the increment of 
the function (corresponding to this increment in the argument). 

They are often denoted (not quite legitimately, to be sure) by the symbols 
Az and Af(x) representing functions of h. 

Thus, a function is differentiable at a point if its increment at that point, 
regarded as a function of the increment h in its argument, is linear up to a 
correction that is infinitesimal compared to h as h > 0. 
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Definition 3. The function h++ A(x)h of Definition 2, which is linear in h, 
is called the differential of the function f : E — R at the point x € E and is 
denoted df(x) or Df(z). 


Thus, df (x)(h) = A(x)h. 
From Definitions 2 and 3 we have 


Af(x;h) —df(zx)(h) = a(z; h) , 


and a(z;h) = o(h) ash > 0, x+h € E; that is, the difference between 
the increment of the function due to the increment h in its argument and 
the value of the function df(x), which is linear in h, at the same h, is an 
infinitesimal of higher order than the first in h. 

For that reason, we say that the differential is the (principal) linear part 
of the increment of the function. 

As follows from relation (5.12) and Definition 1, 


A(z) = f'(x) = lim Heth- f 
oth eee 


and so the differential can be written as 
df(x)(h) = fi(a)h. (5.13) 
In particular, if f(x) = x, we obviously have f'(x) = 1 and 
dz(h) =1-h=h, 


so that it is sometimes said that “the differential of an independent variable 
equals its increment”. 
Taking this equality into account, we deduce from (5.13) that 


df(x)(h) = f (x)dz(h) , (5.14) 
that is, | 
df(x) = f'(x) dz. (5.15) 


The equality (5.15) should be understood as the equality of two functions 
of h. 
From (5.14) we obtain 


df(x)(h) _ pr 
that is, the function aie) (the ratio of the functions df (x) and dz) is constant 
and equals f'(x). For this reason, following Leibniz, we frequently denote the 
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derivative by the symbol afta). alongside the notation f'(x) proposed by 
Lagrange. - 

In mechanics, in addition to these symbols, the symbol ġ(t) (read “phi-dot 
of t”) is also used to denote the derivative of the function y(t) with respect 
to time t. 


5.1.3 The Tangent Line; Geometric Meaning 
of the Derivative and Differential 


Let f : E — R be a function defined on a set E C R and Zo a given limit 
point of E. We wish to choose the constant co so as to give the best possible 
description of the behavior of the function in a neighborhood of the point xo 
among constant functions. More precisely, we want the difference f(x) — co 
to be infinitesimal compared with any nonzero constant as x > £o, x E E, 
that is 

f(x) = co +o(1) as z > to TEE. (5.17) 


This last relation is equivalent to saying z lim f(x) = co. If, in particu- 
DTTO 
lar, the function is continuous at xo, then Jim f(x) = f(x), and naturally 
L—-TZo 


co = f (Zo). 


Now let us try to choose the function co + cı (£ — £o) so as to have 
f(x) =co+cai(x—29) + o(£ — £0) as t > to TEE. (5.18) 


This is obviously a generalization of the preceding problem, since the formula 
(5.17) can be rewritten as 


f(z) = co + o((£ — zo)?) as £ > to LE E. 


It follows immediately from (5.18) that co = elim f(x), and if the 
L—TZo 
function is continuous at this point, then co = f (xo). 
If co has been found, it then follows from (5.18) that 


; wi) GC 
c = lim f(x) i f 
E5z—>2% T — To 


And, in general, if we were seeking a polynomial Pp (£0; x) = co +c1 (£ — zo) + 
-+-+ Cn(£ — z0)” such that 


f(x) = co + c1 (£ — z0) +--+ + en(a — 20)” + o((x — zo)”) 
as £ > Xo, £ € E (5.19) 
we would find successively, with no ambiguity, that 


5 J. L. Lagrange (1736-1831) — famous French mathematician and specialist in the- 
oretical mechanics. 
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Co = lim x 
0 Baie f( ) ’ 
cq = lim 4£ (2) eo : 
E3x2—-2X0 0 
f(z)- [co+--+cn-1 (2-20) ] 
Üp = lim = 


(x—20)” ’ 


assuming that all these limits exist. Otherwise condition (5.19) cannot be 
fulfilled, and the problem has no solution. 

If the function f is continuous at zo, it follows from (5.18), as already 
pointed out, that co = f(x); and we then arrive at the relation 


f(x) — f(ao) = c1(£ — z0) + O(a — zo) as t > to TE E, 


which is equivalent to the condition that f(x) be differentiable at xo. 
From this we find 


a= pi, eat 


= f (zo) . 


We have thus proved the following proposition. 


Proposition 1. A function f : E — R that is continuous at a point xo € E 
that is a limit point of E C R admits a linear approximation (5.18) if and 
only if it is differentiable at the point. 


The function 
p(x) = Co + Cı (x = £o) (5.20) 


with co = f (£o) and cı = f'(xo) is the only function of the form (5.20) that 
satisfies (5.18). 
Thus the function 


p(z) = f (z0) + f'(xo)(x — x0) (5.21) 


provides the best linear approximation to the function f in a neighborhood 
of zo in the sense that for any other function y(x) of the form (5.20) we have 
f(x) — p(x) 4 ofa — zo) as zt > 4, LE E. 

The graph of the function (5.21) is the straight line 


y — f (£0) =f (fo) (e — z0) , (5.22) 


passing through the point (xo, f(£o)) and having slope f'(xo). 

Since the line (5.22) provides the optimal linear approximation of the- 
graph of the function y = f(z) in a neighborhood of the point (xo, f(zo)), it 
is natural to make the following definition. 
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Definition 4. If a function f : E — R is defined on a set E C R and 
differentiable at a point xo € E, the line defined by Eq. (5.22) is called the 
tangent to the graph of this function at the point (zo, f(xo)). 


Figure 5.3 illustrates all the basic concepts we have so far introduced in 
connection with differentiability of a function at a point: the increment of the 
argument, the increment of the function corresponding to it, and the value 
of the differential. The figure shows the graph of the function, the tangent 
to the graph at the point Po = (zo, f (zo)), and for comparison, an arbitrary 
line (usually called a secant) passing through Pp and some point P # Pp of 
the graph of the function. 


y = f(z) 
y — (ao) = Leet 5) Z flgo) (5 zo) 


y — f (xo) = f'(xo)(x — xo) 
flzo+h)}------------------ 


--------------- Af (xo; h) 


The following definition extends Definition 4. 


Definition 5. If the mappings f : E — R and g : E — R are continuous at 
a point xo € E that is a limit point of E and f(x) — g(x) = o((x — 20)") 
as x —> Xo, x E€ E, we say that f and g have nth order contact at xo (more 
precisely, contact of order at least n). 

For n = 1 we say that the mappings f and g are tangent to each other at 
To. 


According to Definition 5 the mapping (5.21) is tangent at £o to a map- 
ping f : E > R that is differentiable at that point. 

We can now also say that the polynomial P, (19; £) = co + cı (£ — zo) + 
----+¢,(x% — xo)” of relation (5.19) has contact of order at least n with the 
function f. 

The number h = x — xo, that is, the increment of the argument, can be 
regarded as a vector attached to the point xo and defining the transition from 


184 5 Differential Calculus 


zo to = Xo + h. We denote the set of all such vectors by TR(zo) or TR,z,.° 
Similarly, we denote by TR(yo) or TR,, the set of all displacement vectors 
from the point yo along the y-axis (see Fig. 5.3). It can then be seen from 
the definition of the differential that the mapping 


df(zo) : TR(xo) > TR(f(zo)) , (5.23) 
defined by the differential h +> f'(xo)h = df(xo)(h) is tangent to the mapping 
hry f(ao +h) — f(xo) = Af(zo;h) , (5.24) 


defined by the increment of a differentiable function. 

We remark (see Fig. 5.3) that if the mapping (5.24) is the increment of the 
ordinate of the graph of the function y = f(x) as the argument passes from 
£o to £o +h, then the differential (5.23) gives the increment in the ordinate 
of the tangent to the graph of the function for the same increment h in the 
argument. 


5.1.4 The Role of the Coordinate System 


The analytic definition of a tangent (Definition 4) may be the cause of some 
vague uneasiness. We shall try to state what it is exactly that makes one 
uneasy. However, we shall first point out a more geometric construction of 
the tangent to a curve at one of its points Po (see Fig. 5.3). 

Take an arbitrary point P of the curve different from Po. The line deter- 
mined by the pair of points Py and P, as already noted, is called a secant 
in relation to the curve. We now force the point P to approach Po along the 
curve. If the secant tends to some limiting position as we do so, that limiting 
position of the secant is the tangent to the curve at Pp. 

Despite its intuitive nature, such a definition of the tangent is not available 
to us at the moment, since we do not know what a curve is, what it means to 
say that “a point tends to another point along a curve”, and finally, in what 
sense we are to interpret the phrase “limiting position of the secant”. 

Rather than make all these concepts precise, we point out a fundamental 
difference between the two definitions of tangent that we have introduced. 
The second was purely geometric, unconnected (at least until it is made 
more precise) with any coordinate system. In the first case, however, we have 
defined the tangent to a curve that is the graph of a differentiable function 
in some coordinate system. The question naturally arises whether, if the 
curve is written in a different coordinate system, it might not cease to be 
differentiable, or might be differentiable but yield a different line as tangent 
when the computations are carried out in the new coordinates. 

This question of invariance, that is, independence of the coordinate sys- 
tem, always arises when a concept is introduced using a coordinate system. 


6 This is a slight deviation from the more common notation Tso R or Tz, (R). 
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The question applies in equal measure to the concept of velocity, which we 
discussed in Subsect. 5.1.1 and which, as we have mentioned already, includes 
the concept of a tangent. 

Points, vectors, lines, and so forth have different numerical characteris- 
tics in different coordinate systems (coordinates of a point, coordinates of a 
vector, equation of a line). However, knowing the formulas that connect two 
coordinate systems, one can always determine from two numerical represen- 
tations of the same type whether or not they are expressions for the same 
geometric object in different coordinate systems. Intuition suggests that the 
procedure for defining velocity described in Subsect. 5.1.1 leads to the same 
vector independently of the coordinate system in which the computations 
are carried out. At the appropriate time in the study of functions of several 
variables we shall give a detailed discussion of questions of this sort. The 
invariance of the definition of velocity with respect to different coordinate 
systems will be verified in the next section. 

Before passing to the study of specific examples, we now summarize some 
of the results. 

We have encountered the problem of the describing mathematically the 
instantaneous velocity of a moving body. 

This problem led us to the problem of approximating a given function in 
the neighborhood of a given point by a linear function, which on the geometric 
level led to the concept of the tangent. Functions describing the motion of a 
real mechanical system are assumed to admit such a linear approximation. 

In this way we have distinguished the class of differentiable functions in 
the class of all functions. 

The concept of the differential of a function at a point has been intro- 
duced. The differential is a linear mapping defined on displacements from the 
point under consideration that describes the behavior of the increment of a 
differentiable function in a neighborhood of the point, up to a quantity that 
is infinitesimal in comparison with the displacement. 

The differential df(xo)h = f'(xo)h is completely determined by the num- 
ber f'(xo), the derivative of the function f at x9, which can be found by 


taking the limit 
/ = : f(x) E f (Zo) 
f (xo) = pia a 7 


The physical meaning of the derivative is the rate of change of the quantity 
f(x) at time zo; its geometrical meaning is the slope of the tangent to the 
graph of the function y = f(x) at the point (zo, f(xo)). 

5.1.5 Some Examples 


Example 1. Let f(x) = sina. We shall show that f'(x) = cos zx. 
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Proof. 
im Eth -sinz _ | 2sin(g)cos(e+ 5) _ 
h-0 h h-0 h 
sin (8) 


= cosxz. O 


I 


h 
lim cos (z + =) «lim 
h—0 2/ h=>0 (3) 
Here we have used the theorem on the limit of a product, the continuity 
of the function cos x, the equivalence sin t ~ t as t — 0, and the theorem on 


the limit of a composite function. 


Example 2. We shall show that cos’ x = — sin zx. 
Proof. 
lim cos(x +h) -cosx _ lim —2sin (È) sin (x + £) : 
h—0 h h—0 h 
h in (3 | 
= — lim sin («+ =) - lim sin (3) =-—sinz. O 
h->0 2 h-0 (3) 


Example 3. We shall show that if f(t) = rcoswt, then f’(t) = —rwsinwt. 
Proof. 


rcosw(t+h)—rcoswt _ 


lim r lim ———————— = 


h—+0 h h—0 h 
ed hy , sin(S) | 
= —rw lim sinw(t + =) - lim uh S —rwsinwt. O 
h-0 2 h—-0 (=> 


Example 4. If f(t) = rsinwt, then f'(t) = rw coswt. 
Proof. The proof is analogous to that of Examples 1 and 3. O 


Example 5. The instantaneous velocity and instantaneous acceleration of a 
point mass. Suppose a point mass is moving in a plane and that in some 
given coordinate system its motion is described by differentiable functions of 
time 

z= x(t) ’ y = y(t) 


or, what is the same, by a vector 
r(t) = (x(t), y(t) . 
As we have explained in Subsect. 5.1.1, the velocity of the point at time t is 


the vector 
v(t) = F(t) = (i(t), y(t) , 


where “(t) and y(t) are the derivatives of x(t) and y(t) with respect to time t. 
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The acceleration a(t) is the rate of change of the vector v(t), so that 


a(t) = v(t) = #(t) = (4), (t) , 


where z(t) and y(t) are the derivatives of the functions z(t) and y(t) with 
respect to time, the so-called second derivatives of x(t) and y(t). 

Thus, in the sense of the physical problem, functions x(t) and y(t) that 
describe the motion of a point mass must have both first and second deriva- 
tives. 

In particular, let us consider the uniform motion of a point along a circle 
of radius r. Let w be the angular velocity of the point, that is, the magnitude 
of the central angle over which the point moves in unit time. 

In Cartesian coordinates (by the definitions of the functions cosxz and 
sin x) this motion is written in the form 


r(t) = (rcos(wt + a), rsin(wt + a)) , 
and if r(0) = (7,0), it assumes the form 
r(t) = (rcoswt,rsinwt) . 


Without loss of generality in our subsequent deductions, for the sake of 
brevity, we shall assume that r(0) = (7,0). 
Then by the results of Examples 3 and 4 we have 


v(t) = r(t) = (—rw sin wt, rw cos wt) . 
From the computation of the inner product 
(v(t), r(t)) = —r?w sin wt coswt + r°w coswt sinwt = 0, 


as one should expect in this case, we find that the velocity vector v(t) is or- 
thogonal to the radius-vector r(t) and is therefore directed along the tangent 
to the circle. 

Next, for the acceleration, we have 


2 


a(t) = v(t) = #(t) = (—rw? coswt, —rw? sin wt) , 


that is, a(t) = —w?r(t), and the accleration is thus indeed centripetal, since 


it has the direction opposite to that of the radius-vector r(t). 


Moreover, > ; 
2 2 vO _ v 
a(t)| =" [r(t)| or Me 
where v = |v(t)|. 

Starting from these formulas, let us compute, for example, the speed of a 
low-altitude satellite of the Earth. In this case r equals the radius of the earth, 
that is, r = 6400 km, while |a(t)| = g, where g ~ 10 m/s? is the acceleration 
of free fall at the surface of the earth. 

Thus, v? = |a(t)|r ~ 10m/s* x 64-10°m = 64- 10°(m/s)?, and so 
v = 8-10° m/s. 
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Example 6. The optic property of a parabolic mirror. Let us consider the 
parabola y = i (p > 0, see Fig. 5.4), and construct the tangent to it at 


the point (xo, yo) = (Zo, zo): 


Since f(x) = zT we have 
EES A S 
st" —- = 1 
f'(xo) = lim 2p æI _ tim (£x + zo) = —Zo. 
xr—>zro £ — To 2p x T0 p 


Hence the required tangent has the equation 
leag d 
y — — zo = —Xo(x — Xo) 
2p ° p 


-zolo — 20) — (y — yo) =0, (5.25) 


where yo = + z2. 


(zo, yo) 


Fig. 5.4. 


The vector n = ( — 5203 1); as can be seen from this last equation, is 
orthogonal to the line whose equation is (5.25). We shall show that the vectors 
ey = (0,1) and ef = ( — xo, Ë — yo) form equal angles with n. The vector ey 
is a unit vector directed along the y-axis, while ey is directed from the point 
of tangency (zo, Yo) = (xo, 520) to the point (0, 2), which is the focus of 
the parabola. Thus | 


cos €n = (ey, n) = a 
s les||n| |n] 
lie E Po 12 p 1 n2 
— á lefn) — p®OtE-— 2% 3 + 35% o 1l 
sern = Teln] 2, (B12)? p, 1,2\2 lol 
|n| x6 rE — 3526) |n] CE) 


Thus we have shown that a wave source located at the point (0, 2), the 
focus of the parabola, will emit a ray parallel to the axis of the mirror (the 
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y-axis), and that a wave arriving parallel to the axis of the mirror will pass 
through the focus (see Fig. 5.4). 


Example 7. With this example we shall show that the tangent is merely the 
best linear approximation to the graph of a function in a neighborhood of the 
point of tangency and does not necessarily have only one point in common 
with the curve, as was the case with a circle, or in general, with convex curves. 
(For convex curves we shall give a separate discussion.) 

Let the function be given by 


x’ sin+ , if 40, 


f(x) = | 
0, if ee: 


The graph of this function is shown by the thick line in Fig. 5.5. 


y a 
y=r 
/ 2? sin + , if 2«20, 
y =r? ra y= a 
oe 0, if x=0 
7 
7 
Pea 
aot Zr, 2 a 
T T TENT 


Fig. 5.5. 


Let us find the tangent to the graph at the point (0,0). Since 


Jaen i 
EE nas on 
f (0) = lim ~~) = lim zsin~ =0, 
the tangent has the equation y — 0 = 0- (x — 0), or simply y = 0. 
Thus, in this example the tangent is the x-axis, which the graph intersects 


infinitely many times in any neighborhood of the point of tangency. 


By the definition of differentiability of a function f : E — R at a point 
Xo € E, we have 


f(x) — f(xo) = A(ao)(x — zo) + O(a — zo) as t > to LEE. 
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Since the right-hand side of this equality tends to zero as x > xo, x € E, 
it follows that elim f(x) = f(xo), so that a function that is differentiable 
1—0 


at a point is necessarily continuous at that point. 
We shall show that the converse, of course, is not always true. 


Example 8. Let f(x) = |x|, (Fig. 5.6). Then at the point xo = 0 we have 
LOL ee, E a Eo 5 


xz—zxzo—0 £T — To z>—0 x —O r——0 £ 
: T= Bt : x| = st, HE 
x—>xo +0 £z — To z>+0 z—0 xr—>+0 x 


Consequently, at this point the function has no derivative and hence is 
not differentiable at the point. 


0 x 
Fig. 5.6. 


Example 9. We shall show that e**+” — e? = efh + o(h) ash + 0. 
Thus, the function exp(z) = e” is differentiable and dexp(xz)h = exp(z)h, 


or de? = e*dz, and therefore exp’ x = exp 2, or ge =e”, 


Proof. 
et th _ e? = e? (e — 1) = e” (h + 0(h)) = eh + 0(h) . 


Here we have used the formula e” — 1 = h + o(h) obtained in Example 39 of 
Subsect. 3.2.4. O 


Example 10. If a > 0, then a?t+” — a? = a(lna)h + o(h) as h — 0. Thus 
da” = a*(Ina)dz and 4 = a” Ina. 


Proof. 


afte? -at= a? (a? = 1) = ater = 1) = 


=a” (hlna + o(hlna)) = a*(Ina)h+0(h) as h—> 0. O 


Example 11. If x # 0, then In|x + h| —ln|z| = +h + o(h) as h > 0. Thus 


dl 
dln |z| = +dz and dine == 
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Proof. 
h 
In | + h| — In|z| = In |1 + =| 
x 


For |h| < |x| we have |1 + È| = 1+ 4, and so for sufficiently small values of 
h we can write 


in| +h] —In|2| = In (1+ *) = *+0(2) = —h-+o(h) 


as h — 0. Here we have used the relation In(1 +t) = ¢+o(t) as t > 0, shown 
in Example 38 of Subsect. 3.2.4. O 


Example 12. If x #0 and 0 < a #1, then log, |x+h|—log, |x| = —+—h+o(h) 
as h + 0. Thus, dlog, |z| = Lda and S82 lel — re 


Proof. 


h h 
log, |x + h| — log, |x| = log, |1 + =| = log, (4 + ~) = 
£ £ 


= —m(1+ 2) = —(* +0(2))= N 


ln Ina\z £ xlna 


Here we have used the formula for transition from one base of logarithms 
to another and the considerations explained in Example 11. O 


5.1.6 Problems and Exercises 


1. Show that 
a) the tangent to the ellipse 


at the point (xo, yo) has the equation 


tro | Yy _ 1. 
az bB”? 
b) light rays from a source located at a focus Fi = ( - Va? =P, 0) or 


Fo = (Va? =P, 0) of an ellipse with semiaxes a > b > 0 are gathered at the 
other focus by an elliptical mirror. 
2. Write the formulas for approximate computation of the following values: 

a) sin (z + a) for values of a near 0; 

b) sin(30° + a°) for values of a° near 0; 

c) cos (3 + a) for values of a near 0; 


d) cos(45° + a°) for values of a° near 0. 
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3. A glass of water is rotating about its axis at constant angular velocity w. Let 
y = f(x) denote the equation of the curve obtained by cutting the surface of the 
liquid with a plane passing Sande its axis of rotation. 


a) Show that f'(x) = “x, where g is the acceleration of free fall. (See Exam- 
ple 5.) 

b) Choose a function f(x) that satisfies the condition given in part a). (See 
Example 6.) 


c) Does the condition on the function f(x) given in part a) change if its axis of 
rotation does not coincide with the axis of the glass? 


4. A body that can be regarded as a point mass is sliding down a smooth hill under 
the influence of gravity. The hill is the graph of a differentiable function y = f(z). 


a) Find the horizontal and vertical components of the accleration vector that 
the body has at the point (xo, yo). 


b) For the case f(x) = x? when the body slides from a great height, find the 
point of the parabola y = x” at which the horizontal component of the acceleration 
is maximal. 


5. Set 
sae Os es oy 
Wo(x) = 
l-av,if$;<a<1, 


and extend this function to the entire real line so as to have period 1. We denote 
the extended function by yo. Further, let 


pa(2) = = y0(4"2) . 


The function yn has Period 47” and a derivative equal to +1 or —1 everywhere 


except at the points x = oy sr, 2 E Z. Let 


f(x) = X pale). 


Show that the function f is defined and continuous on R, but does not have a 
derivative at any point. (This example is due to the well-known Dutch mathemati- 
cian B. L. van der Waerden (1903-1996). The first examples of continuous functions 
having no derivatives were constructed by Bolzano (1830) and Weierstrass (1860).) 
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5.2 The Basic Rules of Differentiation 


Constructing the differential of a given function or, equivalently, the process 
of finding its derivative, is called differentiation.’ 


5.2.1 Differentiation and the Arithmetic Operations 


Theorem 1. If functions f : X — R and g : X — R are differentiable at a 
point x E€ X, then 
a) their sum is differentiable at x, and 


(f +9) (x) = (F +9')(2) ; 
b) their product is differentiable at x, and 


(f - 9) (2) = f(x) - g(x) + f(z) - g'(x) ; 
c) their quotient is differentiable at x if g(x) #0, and 


fy 4) — FE) - f@)9'(@) 
a Gg 9? (x) 


Proof. In the proof we shall rely on the definition of a differentiable function 
and the properties of the symbol o(-) proved in Subsect. 3.2.4. 


a) (f +g)(x +h) - (f+ g)(z) = (f(x +h) + 9(@ +h)) - 
— (f(x) + g(x)) = (f(a +h) — f(x) + (g(@ +h) - g(x) = 
= (f’(x)h + o(h)) + (g'(x)h + o(h)) = (f’(x) + g(x) )h + o(h) = 
= (f’ +9')(x)h + o(h) . 


b) (f-g)(@+h) — (F: 9)(@) = f(a + h)g(a +h) — f(x)g(x) = 
= (f(x) + f'(x)h + o(h)) (g(x) + g'(z)h + o(h)) — f(x) g(z) = 
= (f'(x)9(x) + f(x)g'(z))h + o(h) . 


c) Since a function that is differentiable at a point x € X is continuous at 
that point, taking account of the relation g(x) # 0 and the properties of 
continuous functions, we can guarantee that g(x +h) Æ 0 for sufficiently 
small values of h. In the following computations it is assumed that h is small: 


T Although the problems of finding the differential and finding the derivative are 
mathematically equivalent, the derivative and the differential are nevertheless 
not the same thing. For that reason, for example, there are two terms in French 
— dérivation, for finding the derivative, and différentiation, for finding the differ- 
ential. 
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+ __ (f(a +h) g(x) — f(a)g(@ + h)) = 


Sr 
=(< g2(x Fay +0) (f(x) +f’ (z)h+o(h))g(x)— f(x)(g(x)+9'(x)h+o(h))) = 


1 1 / = 
= (aq +o) (Fala) - Fg (a))h + off) = 
_ f'(x)g(x) — f(x)g'(x) 
= a A + o(h) . 
Here we have used the continuity of g at the point x and the relation 
g(x) Æ 0 to deduce that 
lim ————___~ = E 
h0 g(x)g(z-+h)  g?(x) 


that is, 
1 1 


——— n = a tol), 
Jeje +r) ga) t 
where o(1) is infinitesimal as h > 0,z+heX. O 


Corollary 1. The derivative of a linear combination of differentiable func- 
tions equals the same linear combination of the derivatives of these functions. 


Proof. Since a constant function is obviously differentiable and has a deriva- 
tive equal to 0 at every point, taking f = const = c in statement b) of 
Theorem 1, we find (ceg) (x) = cg' (x). 

Now, using statement a) of Theorem 1, we can write 


(cı f + c29) (£) = (er f) (z) + (c29) (z) = a f' (£) + c2g' (£x) . 
Tkuescowroi wha has just been proved we veni by nduction chat 
(cifi +: +enfn)' (£) = ci fi(£) +- + enfalas). 0 
Corollary 2 I he ononon Tenas Te ane eae aa hen 
(Fit: fn) (2) = fi (2) falx) - fala) + 
+ fi(x) f(a) fa(@) --- fala) +--+ fi() +++ fn-1(2) fa (2) - 


Proof. For n = 1 the statement is obvious. 

If it holds for some n € N, then by statement b) of Theorem 1 it also 
holds for (n + 1) € N. By the principle of induction, we conclude that the 
formula is valid for any ne N. O 
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Corollary 3. It follows from the relation between the derivative and the dif- 
ferential that Theorem 1 can also be written in terms of differentials. To be 
specific: 
a) d(f + g)(x) =df(x) + dg(z) ; 
b) ae g)(x) = g(x)d f(x) + f(x)dg(z) ; 
c) A(4) (2) = Se if g(x) £0. 


Proof. Let us verify, for example, statement a). 


d(f+g9)(z)h = (ft+g)'(a)h=(f' + g')(x)h = 
= (f'(x) + g'(z))h = f'(x)h + g'(£)h = 
= df(x)h + dg(x)h = (df (x) + dg(x))h 


and we have verified that d(f + g)(x) and df (x) + dg(x) are the same func- 
tion. O 


Example 1. Invariance of the definition of velocity. We are now in a position 
to verify that the instantaneous velocity vector of a point mass defined in 
Subsect. 5.1.1 is independent of the Cartesian coordinate system used to 
define it. In fact we shall verify this for all affine coordinate systems. 

Let (xt, x?) and (#1, Z*) be the coordinates of the same point of the plane 
in two different coordinate systems connected by the relations 


— 


aye" le See 
1+ ae 24 Be 


Re Re 
N 


(5.26) 


Since any vector (in affine space) is determined by a pair of points and its 
coordinates are the differences of the coordinates of the terminal and initial 
points of the vector, it follows that the coordinates of a given vector in these 
two coordinate systems must be connected by the relations 


ŭl = atu! + abv? , (5.27) 
vo? = atu! + abv’. 


If the law of motion of the point is given by functions x/(t) and x?(t) in 
one system of coordinates, it is given in the other system by functions <1 (t) 
and £*(t) connected with the first set by relations (5.26). 


Differentiating relations (5.26) with respect to t, we find by the rules for 
differentiation 


et : ; 

z = alit! +alit?, 

22 2-1 2-2 (5.28) 
L = Qat + a5gr . 


Thus the coordinates (v!, v?) = (z1, t?) of the velocity vector in the first 
: ~] ~ 2k 2 i 
system and the coordinates (#1, 07) = (Z ,% ) of the velocity vector in the 
second system are connected by relations (5.27), telling us that we are dealing 
with two different expressions for the same vector. 
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Example 2. Let f(x) = tanz. We shall show that f'(x) = —)-— at every 


cos’ x 
point where cosx +Æ 0, that is, in the domain of definition of the function 
tang = 222 
cos © ‘ cf 
It was shown in Examples 1 and 2 of Sect. 5.1 that sin (x) = cosx and 


cos’ x = — sin x, so that by statement c) of Theorem 1 we find, when cos x Æ 0, 


j sin \/ sin’ x cos x — sin xcos’ x 
tan’ x = ( — } («&) = ——_. — = 
cos 


cos? £ 
_ cosxcosx+sinzsinzg 1 
E cos? x ~ cos? a | 
Example 3. cot’x = -4 wherever singz Æ 0, that is, in the domain of 
definition of cot x = $=, 
Indeed, 


cos’ z sin £ — cos zsin’ x 


sin? z 


— Sin 7z Sin £ — COS T COS £ 1 


sin? x sin? x 


/ 
cot’ r = (=) (x) = 
sin 


Example 4. If P(x) = co + cv +++: + cng” is a polynomial, then P’(r) = 
Cy + 2cox +--+ ne,z"?. 
dx dx” 


Indeed, since 5% = 1, by Corollary 2 we have — = nz", and the 


statement now follows from Corollary 1. 


5.2.2 Differentiation of a Composite Function (chain rule) 


Theorem 2. (Differentiation of a composite function). If the function f : 
X — Y CR 1s differentiable at a point x E X and the functiong: Y — R 
is differentiable at the point y = f(x) € Y, then the composite function 
gof:X —> R is differentiable at x, and the differential d(go f)(x) : TR(x) > 
TR(g(f(z))) of their composition equals the composition df(y) o df(x) of 
their differentials 


df(x): TR(z) > TR(y = f(z)) and dg(y = f(x)) : TR(y) > TR(g(y)) - 


Proof. The conditions for differentiability of the functions f and g have the 
form 


f(x +h)-— f(x) = f'(x)h+o(h) ah>0,c4+hexX, 
gly + t) — gly) = g'(y)jt + o(t) ast—>0,y+tEY. 
We remark that in the second equality here the function o(t) can be 


considered to be defined for t = 0, and in the representation o(t) = y(t)t, 
where y(t) > 0 ast > 0, y +t € Y, we may assume 7(0) = 0. Setting 
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f(x) =y and f(x+ h) = y+t, by the differentiability (and hence continuity) 
of f at the point x we conclude that t > 0 as h + 0, and if x +h € X, then 
y+t € Y. By the theorem on the limit of a composite function, we now have 


(f(z +h) — f(z)) =a(h) 30 ash>0,c+heX, 
and thus if t = f(x +h) — f(x), then 


o(t) = y(f(@+h) - f(a) (f(a +h) - f(x)) = 
= a(h)(f'(x)h + o(h)) = a(h)f'(x)h + a(h)o(h) = 
= o(h) + o(h) =o(h)ash>0,r+hEex. 


(go f)(£ +h) — (g0 f)(x) = g(f(e + h)) — 9(f(2)) = 
= g(y + t) — gly) = g'(y)t + o(t) = 
= g'(f(2)) (f@ +h) — f(a) +o(f(£ +h) - f(a) = 
= 9' (F(a) (F' (z)h + o(h)) + o(f(@ + h) - f(x) = 
= 9'(f(x))(f'(z)h) + 9'(F(2)) (o(h)) + (f(a +h) - f(x) . 
Since we can interpret the quantity g’(f(x))(f’(z)h) as the value 
dg(f(z)) o df(x)h of the composition h pe a g'(f(z)) - f’(x)h of the 


mappings h ue) F(a) ht saw g'(y)r at the displacement h, to complete 


the proof it remains only for us to remark that the sum 


g' (f(z)) (o(h)) + o( f(a +h) — f(z) 


is infinitesimal compared with h ash > 0, x+ h € X, or, as we have already 
established, 


o(f(x+h)-— f(x)) =o(h) ash>0,r+hex. 
Thus we have proved that 


(gof)(x+h)—(go f)(x) = 
= g'(f(z)) -f'(a)ht+o(h)ash>0,2+hEexX. 0 


Corollary 4. The derivative (go f)(x) of the composition of differentiable 
real-valued functions equals the product g'(f(zx)) - f'(x) of the derivatives of 
these functions computed at the corresponding points. 


There is a strong temptation to give a short proof of this last statement 
in Leibniz’ notation for the derivative, in which if z = z(y) and y = y(x), we 


have 
dz dz dy 


dr dy dz’ 
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which appears to be completely natural, if one regards the symbol $ Z or ay 
not as a unit, but as the ratio of dz to dy or dy to dz. 
The idea for a proof that thereby arises is to consider the difference quo- 


tient 
Az E Az Ay 


Ar Ay Ax 


and then pass to the limit as Ax — 0. The difficulty that arises here (which 
we also have had to deal with in part!) is that Ay may be 0 even if Ax + 0. 


Corollary 5. If the composition (fn °o---° fix) of differentiable functions 
yı = filz); -3 Yn = fnlYn-1) exists, then 


(fno: o f1) (2) = falYn-1)fn-1(Yn-2) -fi (2) - 


Proof. The statement is obvious if n = 1. 
If it holds for some n € N, then by Theorem 2 it also holds for n + 1, so 
that by the principle of induction, it holds for any ne N. O 


Example 5. Let us show that for œ € R we have de“ = &œx®T! in the domain 
xz > 0, that is, dz% = ax*—!dx and 


(x + h)* —2* =ax%"h+o(h) ash->0. 


Proof. We write z = e*!™* and apply the theorem, taking account of the 
results of Examples 9 and 11 from Sect. 5.1 and statement b) of Theorem 1. 
Let g(y) =e” and y = f(x) = aln(z). Then z% = (go f)(x) and 


(90 fY'(@) = gy): f(a) =: = =e. San". 0 


Example 6. The derivative of the logarithm of the absolute value of a differ- 
entiable function is often called its logarithmic derivative. 

Since F(x) = In|f(x)| = (Ino| |o f)(x), by Example 11 of Sect. 5.1, we 
have F’(x) = (1n |f|) (£) = 5 2. 

Thus 


Fa) a, dfe) 
Fa T Fe) 


Example 7. The absolute and relative errors in the value of a differentiable 
function caused by errors in the data for the argument. 
If the function f is differentiable at x, then 


F(z +h) — f(z) = f'(£)h + a(z;h) , 


where a(x; h) = o(h) as h > 0. 

Thus, if in computing the value f(x) of a function, the argument z is 
determined with absolute error h, the absolute error | f(x + h) — f (x)| in the 
value of the function due to this error in the argument can be replaced for 


d(In|f|)(z) = 
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small values of h by the absolute value of the differential |df(x)h| = |f’(x)h| 
at displacement h. 
The relative error can then be computed as the ratio 14 eala = oa or 


as the absolute value of the product Bol |h| of the logarithmic derivative 


of the function and the magnitude of the absolute error in the argument. 
We remark by the way that if f(x) = Inz, then dling = $, and the 
absolute error in determining the value of a logarithm equals the relative 
error in the argument. This circumstance can be beautifully exploited for 
example, in the slide rule (and many other devices with nonuniform scales). 
To be specific, let us imagine that with each point of the real line lying right 
of zero we connect its coordinate y and write it down above the point, while 
below the point we write the number zx = e”. Then y = Inz. The same real 
half-line has now been endowed with a uniform scale y and a nonuniform 
scale x (called logarithmic). To find In z, one need only set the cursor on the 
number x and read the corresponding number y written above it. Since the 
precision in setting the cursor on a particular point is independent of the 
number zx or y corresponding to it and is measured by some quantity Ay (the 
length of the interval of possible deviation) on the uniform scale, we shall 
have approximately the same absolute error in determining both a number 
x and its logarithm y; and in determining a number from its logarithm we 
shall have approximately the same relative error in all parts of the scale. 


Example 8. Let us differentiate a function u(x)’, where u(x) and v(x) are 
differentiable functions and u(x) > 0. We write u(x)’\*) = e@) nulz) and 
use Corollary 5. Then 


dev(2) In u(x) 
dz 


— Q(x) Inu(z) (v'(2) In u(x) + ue) 5) ~ 


= u(x)” - v(x) Inu(a) + o(x)u(e)?-! - u’ (x). 


5.2.3 Differentiation of an Inverse Function 


Theorem 3. (The derivative of an inverse function). Let the functions f : 
X => Y and f-!:Y — X be mutually inverse and continuous at points 
ro E X and f(xo) = yo E Y respectively. If f is differentiable at xo and 
f'(zo) #0, then ft is also differentiable at the point yo, and 


(f-1)'(yo) = (f'(20)) - 


Proof. Since the functions f : X — Y and f~!: Y — X are mutually inverse, 
the quantities f(x) — f(zo) and f—~'(y) — f—!(yo), where y = f(x), are both 
nonzero if x Æ Xo. In addition, we conclude from the continuity of f at £o and 
fT! at yo that (X > x > zo) & (Y 3 y > yo). Now using the theorem on 


200 5 Differential Calculus 


the limit of a composite function and the arithmetic properties of the limit, 
we find 
AVA pel _ 
i, Ce a a E ‘a 
Y3y yo Y — Yo X3z>zo f(x) — f(xo) 


_ j 1 o 1 
7 Xori (f)-L@o)) = f'(xo) 
®L—-ZLO 


Thus we have shown that the function f~! : Y — X has a derivative at 
yo and that 


_ -1 

(f 1)"(yo) = (f’(xo)) 
Remark 1. If we knew in advance that the function f—! was differentiable 
at yo, we would find immediately by the identity (f—' o f)(x) = x and the 
theorem on differentiation of a composite function that (f -1) (yo): i Go) =i]; 
Remark 2. The condition f'(xo) Æ 0 is obviously equivalent to the statement 


that the mapping h+> f'(xo)h realized by the differential df (xo) : TR(£0) > 
TR(yo) has the inverse mapping [df(zo)|~! : TR(yo) > T’R(zo) given by the 
1 


formula t+ (f'(£o)) T 
Hence, in terms of differentials we can write the second statement in 
Theorem 3 as follows: 


If a function f is differentiable at a point xo and its differential df (xo) : 
TR(x%o) > TR(yo) is invertible at that point, then the differential of the 
function fT! inverse to f exists at the point yo = f(xo) and is the mapping 


df~"(yo) = [df(zo)\~* : TR(yo) > TR(ao) , 
inverse to df(zo) : TR(xo) > TR(yo). 


Example 9. We shall show that arcsin’ y = = for |y| < 1. The functions 


sin : [-r/2, 7/2] — [—1,1] and arcsin : [—1,1] > [-7r/2, 7/2] are mutually 
inverse and continuous (see Example 8 of Sect. 4.2) and sin’(x) = cosg Æ 0 if 
|x| < m/2. For |x| < 1/2 we have |y| < 1 for the values y = sin x. Therefore, 
by Theorem 3 


Sa 1 1 1 1 
arcsin y = = = 


sin’x cosa (sues. ae 
The sign in front of the radical is chosen taking account of the inequality 
cosx > 0 for |x| < 7/2. 


Example 10. Reasoning as in the preceding example, one can show (taking 
account of Example 9 of Sect. 4.2) that 


arccos y = — for |y| <1. 


1 
Ta 
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Indeed, 


E 1 1 1 
arccos y = = a ON 


/ == e e 
cos! x sin x 4/1 — cos? x J/1—y? 


The sign in front of the radical is chosen taking account of the inequality 
sing >QifO0<T<T. 


Example 11. arctan’ y = ere y ER. 


Indeed, 
arctan’ y = Pe ae cos? x = : = 
taws (55) = l+tanRr 1l+y?’ 


Example 12. arccot’y = Sr yER. 


Indeed 
arccot’ EE SE EE ee 
ole ea —  L+cot?a  1+y2` 


Example 13. We already know (see Examples 10 and 12 of Sect. 5.1) that 
the functions y = f(x) = a” and x = f~'(y) = log, y have the derivatives 


f'(x) =a Ina and (f-1)'(y) = T 
Let us see how this is consistent with Theorem 3: 


ee 1 1 
=Y) = f'(x) alina ylna 


f(z) = 


1 — == 
TOES 


Example 14. The hyperbolic and inverse hyperbolic functions and their 
derivatives. The functions 


sinha = —(e*—e”), 


Il 
| 
— 
(g>) 
8 
+ 
? 
8 
x 


cosh x 


are called respectively the hyperbolic sine and hyperbolic cosine? of zx. 
These functions, which for the time being have been introduced purely 
formally, arise just as naturally in many problems as the circular functions 
sin x and cos z. 
We remark that 


sinh(—x) = —sinhz , 


cosh(—x) = coshz , 


8 From the Latin phrases sinus hyperbolici and cosinus hyperbolici. 
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that is, the hyperbolic sine is an odd function and the hyperbolic cosine is 
an even function. 
Moreover, the following basic identity is obvious: 


cosh? z — sinh? z = 1 . 


The graphs of the functions y = sinhg and y = coshz are shown in 
Fig. 5.7. 


Fig. 5.7. 


It follows from the definition of sinh z and the properties of the function 
e? that sinhz is a continuous strictly increasing function mapping R in a 
one-to-one manner onto itself. The inverse function to sinhz thus exists, is 
defined on R, is continuous, and is strictly increasing. 

This inverse is denoted arsinh y (read “area-sine of y” ).? This function is 
easily expressed in terms of known functions. In solving the equation 


(e =e") =y 
for x, we find successively 
e =yt+ V1+y? 
(e? > 0, and so e? Æ y — \/1+ y2) and 
x= ln (y + V1 +42). 


Thus, 
arsinh y = ln (y + V1 +4?), yER. 


? The full name is area sinus hyperbolici (Lat.); the reason for using the term area 
here instead of arc, as in the case of the circular functions, will be explained 
later. 
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Similarly, using the monotonicity of the function y = coshz on the two 
intervals RL = {x € R|x < 0} and R} = {x € R|x > 0}, we can con- 
struct functions arcosh -y and arcosh +y, defined for y > 1 and inverse to 
the function cosh x on R_ and R, respectively. 

They are given by the formulas 


arcosh__y = ln (y — Vy?-1), 
arcosh +y = ln (y + Vy? — 1). 


From the definitions given above, we find 
1 
sinh’ z = 5 (e* + e) = cosh z , 


cosh’ z = (ee —e*) =sinhz, 


2 
and by the theorem on the derivative of an inverse function, we find 
arsinh’y = a ares l = l 
4 inbe  coshe V1 + sinh? z “4 Pye’ 

1 1 1 1 

arcosh y = —— = ——— = — c = 1 
cosh z sinhz  —/cosh? z -— 1 y2—1 

1 1 1 1 

arcosh y = ——— = = —————— = ———_, y>l. 


cosh’x sinhz , E | , [y2 Si 


These last three relations can be verified by using the explicit expressions 
for the inverse hyperbolic functions arsinh y and arcosh y. 
For example, 


: / 
arsinh y = 


1 ( 1 2\—1/2 
(1+ 5(1+y?)"/?-2y) = 
y+ yl +y? 2 


ee _ ç vl+y¥ý+ty__ 1 
ytVYlt+y vVJl1l+y? V1+y? 
Like tan xz and cot x one can consider the functions 


sinh x cosh x 

and coth z = — ; 
osh x sinh x 

called the hyperbolic tangent and hyperbolic cotangent respectively, and also 


the functions inverse to them, the area tangent 


tanh z = 


1, 1+ 
tanhy = -1 ; <1, 
artanh y = 5 In >— y |y] 
and the area cotangent 
1 1 
arcoth y = ee ly|>1. 


2 y-l1 
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We omit the solutions of the elementary equations that lead to these 
formulas. 
By the rules for differentiation we have 


P , I , 
sinh gz cosh z — sinh z cosh x 


tanh’ x = 5 
cosh* x 
cosh x cosh x — sinh z sinh x _ 1 
cosh? x cosh? x ’ 
a N cosh’ z sinh x —coshxsinh’ x _ 
sinh? x 
sinh x sinh x — cosh x cosh x E 1 
sinh? x sinh? gz 
By the theorem on the derivative of an inverse function 
1 1 
artanh’s = ——— = + = cosh? z = 
tanh x (se =) 
1 1 
T Dns p S 
/ 1 1 > 12 
arcoth z = 7 = I = — sinh“ x = 
coth x ( ~~ sinh? z) 
1 


EET 
coth?zr-1 y?—1’ i l 


The last two formulas can also be verified by direct differentiation of the 
explicit formulas for the functions artanh y and arcoth y. 


5.2.4 Table of Derivatives of the Basic Elementary Functions 


We now write out (see Table 5.1) the derivatives of the basic elementary 
functions computed in Sects. 5.1 and 5.2. 


5.2.5 Differentiation of a Very Simple Implicit Function 


Let y = y(t) and x = z(t) be differentiable functions defined in a neighbor- 
hood U (tọ) of a point to € R. Assume that the function x = z(t) has an 
inverse t = t(x) defined in a neighborhood V (xo) of x9 = x(to). Then the 
quantity y = y(t), which depends on t, can also be regarded as an implicit 
function of x, since y(t) = y(t(x)). Let us find the derivative of this func- 
tion with respect to x at the point ro, assuming that x’(to) Æ 0. Using the 
theorem on the differentiation of a composite function and the theorem on 
differentiation of an inverse function, we obtain 


dy(t(x))) _ dy(t))—_ dea) 


/ —— —— 


A = dx £L=T0 dt lt=to dx 


dy(t) 
_ dt lt=to _ y,(to) 


= da(t) E a! t í 
TERN dt lt=to (to) 


(Here we have used the standard notation f (2) ro := f(xzo)) 
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Table 5.1. 

f are j Restrictions on 
Function f(x) Derivative f (x) domin ny ee 
1. C (const) 0 
2.47 ax} x>OforaéER 

xéERforaEeN 
3. a” a’ lna xeER(a>0,aF1) 
4. log, |z| + zE€R\0(a>0,aF1l) 
5. sin az COS £ 
6. cosx —sinz 
7.tanz -> r5 trk,kezZ 
8. cot x = rA#Twtk, kez 
9. arcsin x 1 z| <1 
1-2? 
10. arccos x —~—+ lz] <1 
1-22 
11. arctan x Iz 
12. arccot x -i57 
13. sinh x cosh x 
14. cosh x sinh x 
15. tanh x =i 
1 
16. coth x > eee T = 0 
° = 2 1 
17. arsinh z = ln (z +VJV1+z2 ) Tice 
awode=in(etVF=1) aA 
8. arcosh xz = ln ( x + Vx? — 1 tea z| >1 
19. artanhz = ż ln 4+2 r lz] <1 
20. arcoth z = 4 In ZH = |x| > 1 


If the same quantity is regarded as a function of different arguments, in 
_order to avoid misunderstandings in differentiation, we indicate explicitly the 
variable with respect to which the differentiation is carried out, as we have 
done here. 


Example 15. The law of addition of velocities. The motion of a point along 
a line is completely determined if we know the coordinate x of the point in 
our chosen coordinate system (the real line) at each instant t in a system we 
have chosen for measuring time. Thus the pair of numbers (x,t) determines 
the position of the point in space and time. The law of motion is written in 
the form of a function x = z(t). 

Suppose we wish to express the motion of this point in terms of a different 
coordinate system (Z,t). For example, the new real line may be moving uni- 
formly with speed —v relative to the first system. (The velocity vector in this 
case may be identified with the single number that defines it.) For simplicity 
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we shall assume that the coordinates (0,0) refer to the same point in both 
systems; more precisely, that at time t = 0 the point = 0 coincided with 
the point x = 0 at which the clock showed t = 0. 

Then one of the possible connections between the coordinate systems (x, t) 
and (#,t) describing the motion of the same point observed from different 
coordinate systems is provided by the classical Galilean transformations: 


t = ih (5.29) 


a (5.30) 


assuming, of course, that this connection is invertible, that is, the determinant 
of the matrix 6 ; ) is not zero. 

Let x = x(t) and = Z(t) be the law of motion for the point under 
observation, written in these coordinate systems. 

We remark that, knowing the relation z = x(t), we find by formula (5. 30) 


that 
z(t) = ax(t) + bt, 


t(t) = ya(t) + ôt , ee} 
and since the transformation (5.30) is invertible, after writing 
x = art Bt 
SE 5.92 
t = E+ 6, 2) 
knowing % = Z(t), we find 
a(t) = G&(t) + Bt , (5.33) 


t(t) = 72(t) + ot. 


It is clear from relations (5.31) and (5.33) that for the given point there 
exist mutually inverse functions t = t(t) and t = t(t). 
We now consider the problem of the connection between the velocities 


dz(t) 
dt 


= 


V(t) = = a(t) and V(t) = = (Ü 
of the point computed in the coordinate systems (x, t) and (Žž, t) respectively. 
Using the rule for differentiating an implicit function and formula (5.31), 
we have ae adz 
~ T = 
di at oa tÊ 


dt dé ac yis 
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or 
~~ aV(t)+ 6 
V(t) = ——— » 5.34 
(t) yV(t) +6 ees) 
where ¢ and t are the coordinates of the same instant of time in the systems 
(x,t) and (z, t). This is always to be kept in mind in the abbreviated notation 


~ aV+8 
Ka yV +6 


(5.35) 


for formula (5.34). 
In the case of the Galilean transformations (5.29) we obtain the classical 
law of addition of velocities from formula (5.35) 


V=V +v. (5.36) 


It has been established experimentally with a high degree of precision (and 
this became one of the postulates of the special theory of relativity) that in 
a vacuum light propagates with a certain velocity c that is independent of 
the state of motion of the radiating body. This means that if an explosion 
occurs at time t = t = 0 at the point z = Žž = 0, the light will reach the 
points x with coordinates such that x? = (ct)? after time t in the coordinate 
system (x,t), while in the system (ž,ť) this event will correspond to time t 
and coordinates %, where again £? = (ct)?. 

Thus, if x? — c?t? = 0, then z? — ct? = 0 also, and conversely. By virtue of 
certain additional physical considerations, one must consider that, in general 


2- er i (5.37) 


ge? — ct? = Z — ct? 


if (x,t) and (Z,t) correspond to the same event in the different coordinate 
systems connected by relation (5.30). Conditions (5.37) give the following 
relations on the coefficients a, 3, y, and 6 of the transformation (5.30): 

a? — cy? = 1, 
aB—c*y6 = 0, (5.38) 
B — 262 = e 


If c = 1, we would have, instead of (5.38), 


a = y? =] l 
b | 
B? _ 67 ees | 


from which it follows easily that the general solution of (5.39) (up to a change 
of sign in the pairs (a, 3) and (y,6)) can be given as 


a=coshy, y=sinhy, @=sinhy, 6=coshy, 


where ọ is a parameter. 
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The general solution of the system (5.38) then has the form 


aB _f coshy csinhy 
yj  \4sinhy coshy 
and the transformation (5.30) can be made specific: 


x = coshyzx+csinhyt, 
i (5.40) 
= t sinh yz + cosh yt . 


This is the Lorentz transformation. 

In order to clarify the way in which the free parameter ọ is determined, 
we recall that the z-axis is moving with speed —v relative to the x-axis, that 
is, the point z = 0 of this axis, when observed in the system (x, t) has velocity 
—v. Setting = 0 in (5.40), we find its law of motion in the system (z, t): 


x =-—ctanhyt. 
Therefore, 
tanh y = - ; (5.41) 


Comparing the general law (5.35) of transformation of velocities with the 
Lorentz transformation (5.40), we obtain 


7- cosh y V + csinh ọ 
~ AsinhyV +cosho ’ 
or, taking account of (5.41), 
~ V++v' 
V = . 5.42 
T (5.42) 


Formula (5.42) is the relativistic law of addition of velocities, which for 
IVV | < c?, that is, as c — 00, becomes the classical law expressed by formula 
(5.36). 

The Lorentz transformation (5.40) itself can be rewritten taking account 
of relation (5.41) in the following more natural form: 


y x +vt 
TtT = ? 
Lene): 
(5.43) 
; t+ 32 


from which one can see that for |v| < c, that is, as c + oo, they become the 
classical Galilean transformations (5.29). 
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5.2.6 Higher-order Derivatives 


If a function f : E — R is differentiable at every point x € E, then a new 
function f’: E — R arises, whose value at a point x € E equals the derivative 
f'(x) of the function f at that point. 

The function f’: E —> R may itself have a derivative (f’)’ : E > R on E, 
called the second derivative of the original function f and denoted by one of 
the following two symbols: 


2 
Po, =e, 


and if we wish to indicate explicitly the variable of differentiation in the first 
case, we also write, for example, f! (x). 
Definition. By induction, if the derivative f7!) (x) of order n — 1 of f has 
been defined, then the derivative of order n is defined by the formula 
Saar’) ae 
The following notations are conventional for the derivative of order n: 


ie SLO 


Also by convention, f)(z) := f(z). 

The set of functions f : E — R having continuous derivatives up to order 
n inclusive will be denoted C‘”)(E,R), and by the simpler symbol C™) (E), 
or C"(E,R) and C"(E) respectively wherever no confusion can arise. 

In particular CO) (E) = C(E) by our convention that f(x) = f(z). 

Let us now consider some examples of the computation of higher-order 
derivatives. 


Examples 

f(x) f(z) f” (z) i f(a) 
16) a” a” Ina a” ln? a vee a” In” a 
17) e e” e” vee e” 
18) sing COS x —sinz vee sin(x + n7/2) 
19) cosg — sin T — cos T vee cos(x + n7/2) 
20)(1+2)* a(l+z2)%" a(a—1)\(1+2)%? --: a(a—1)--- 

(a—n+1)(14+2)*” 

21). =a axe" a(a—1)2z%* ... a(a—1)++-(a—n4+1)2°” 
22) log, |z| ie — x? vee a a a 


23) 1n |z| x7! (—1)x~? vee (-1)""*(n-—1)la~” 
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Example 24. Leibniz’ formula. Let u(x) and v(x) be functions having deriva- 
tives up to order n inclusive on a common set FE. The following formula of 
Leibniz holds for the nth derivative of their product: 


(uv)™ = = k ylr—m) am) (5.44) 


m=0 


Leibniz’ formula bears a strong resemblance to Newton’s binomial for- 
mula, and in fact the two are directly connected. 


Proof. For n = 1 formula (5.44) agrees with the rule already established for 
the derivative of a product. 

If the functions u and v have derivatives up to order n +1 inclusive, then, 
assuming that formula (5.44) holds for order n, after differentiating the left- 
and right-hand sides, we find 


(uv) +) = 5 i umt) ym) 4 5 is unm) ym) — 


m=0 m=0 
n 
— ,,(n+1),,(0) n n ((n+1)—k),,(k) (0),,(n+1) _ 
=u Vv (aa) US at Us 
k=1 
n+1 


n > (" +1 ) lth) (k) | 


Here we have combined the terms containing like products of derivatives 


of the functions u and v and used the binomial relation i + k i a 


n+1 
k ; 
Thus by induction we have established the validity of Leibniz’ formula. O 
Example 25. If Pa (£) = co + cix +--+ cnz”, then 


Pa (0) = CQ, 

Pi (x) = cı +2c2x +--+ eA and P’ (0) =c, , 

P(x) = 2c2 +3- 2c3x 4+---+n(n—1)en2"~? and P”(0) = 2!c2 , 
PË) (x) = 3- 2c3 +- -n(n — 1)(n — 2)cns”? and PC) (0) = 3!c3 , 


P™ (x) = n(n — 1)(n — 2) ---2cn and P™ (0) = n!en , 
PEx) = Ofork>n. 


Thus, the polynomial P,,(x) can be written as 


1 1 1 
Pa(2) = Py? (0) + GPa? (O)e + 5 Pa? (0)a? ++ + PR (0)2" . 


5.2 The Basic Rules of Differentiation 211 


Example 26. Using Leibniz’ formula and the fact that all the derivatives of 
a polynomial of order higher than the degree of the polynomial are zero, we 
can find the nth derivative of f(r) = x? sin z: 


f(a) = sin™ (x) - £? + (7) sin «2+ ia sin?) z. 2 = 
2 T : T 
=a sin (x +n- ) +2nesin (x+(n-1)5) +(- n(n—1 )sin (z+n5 ))= 


= (x? — n(n — 1))sin (z + n5) — 2nz cos (z + n=) . 
Example 27. Let f(x) = arctanz. Let us find the values f™(0) 
(mS 1, 2.252); 
Since f'(x) = p42, it follows that (1+ 2?) f'(x) =1. 
Applying Leibniz’ formula to this last equality, we find the recursion re- 
lation 


(1 +a?) fF) (x) + Inf (2) +n(n— 1) FO (z) =O, 


from which one can successively find all the derivatives of f(x). 
Setting x = 0, we obtain 


fP (0) = -n(n — 1) Ff YO) . 


For n = 1 we find f‘?)(0) = 0, and therefore f”) (0) = 0. For derivatives 
of odd order we have 


fOP*) (0) = —2m(2m — 1) FEY (0) 
and since f’(0) = 1, we obtain 
fem*) (0) = (-1)™(2m)!. 


Example 28. Acceleration. If x = x(t) denotes the time dependence of a point 


mass moving along the real line, then det) = £(t) is the velocity of the point, 


2 
and then dilt) — =. dz) = ţ(t) is its a at time t. 


If x(t) = at + GB, then z(t) = a and X(t) = 0, that is, the acceleration in 
a uniform motion is zero. We shall soon verify that if the second derivative 
equals zero, then the function itself has the form at + @. Thus, in uniform 
motions, and only in uniform motions, is the acceleration equal to zero. 

But if we wish for a body moving under inertia in empty space to move 
uniformly in a straight line when observed in two different coordinate systems, 
it is necessary for the transition formulas from one inertial system to the other 
to be linear. That is the reason why, in Example 15, the linear formulas (5.30) 
were chosen for the coordinate transformations. 
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Example 29. The second derivative of a simple implicit function. Let y = y(t) 
and x = x(t) be twice-differentiable functions. Assume that the function 
x = x(t) has a differentiable inverse function t = t(x). Then the quantity y(t) 
can be regarded as an implicit function of x, since y = y(t) = y(t(z)). Let us 
find the second derivative y%„ assuming that x’(t) 4 0. 

By the rule for differentiating such a function, studied in Subsect. 5.2.5, 
we have | 


/ 
j= Yt 
L M? 
Tt 
so that 
/ 4} / / 4} 
Y \' Yer Ze — Ue Ltt 
INI — owt ‘tl 
g =y CAE = (7), _ (x4)? — TtYtt — Leet 
re ZEO O o Ti Ti (x;)3 


We remark that the explicit expressions for all the functions that occur 
here, including y%,, depend on t, but they make it possible to obtain the value 
of y”, at the particular point x after substituting for t the value t = t(x) 
corresponding to the value z. 


For example, if y = et and x = Int, then 


pohote, p Wh t 
a ees Co eR a 1/t 


=t(t+1)e’. 


We have deliberately chosen this simple example, in which one can ex- 
plicitly express t in terms of x as t = e” and, by substituting t = e” into 
y(t) = e, find the explicit dependence of y = e® on z. Differentiating this 
last function, one can justify the results obtained above. 

It is clear that in this way one can find the derivatives of any order by 
successively applying the formula 


5.2.7 Problems and Exercises 


1. Let a0,@1,...,Qn be given real numbers. Exhibit a polynomial P, (x) of degree 
n having the derivatives PS) (x0) = az, k = 0,1,...,n, at a given point zo €E R. 


2. Compute f'(x) if 


a) f(x) = 
0 for x =0 


x? sin = for x £0, 
b) f(x) = 
0 forx=0. 
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c) Verify that the function in part a) is infinitely differentiable on R, and that 
f(0) =0. 

d) Show that the derivative in part b) is defined on R but is not a continuous 
function on R. . 


e) Show that the function 


exp ( - ad - aye) for -1<2<1, 
f(z) = 
0 for 1 < |z| 


is infinitely differentiable on R. 


3. Let f € C) (R). Show that for z £0 


l (n) 1 —/_— n d” n—1 1 
wel) rge) 


4. Let f be a differentiable function on R. Show that 
a) if f is an even function, then f’ is an odd function; 
b) if f is an odd function, then f’ is an even function; 
c) (f’ is odd) & (f is even). 


5. Show that 


a) the function f(x) is differentiable at the point xo if and only if f(x)— f (xo) = 
y(x)(x — zo), where y(x) is a function that is continuous at xo (and in that case 


p(zo) = f'(z0)); 
b) if f(x) — f(zo) = (x)(x — xo) and y € co) (U(a0)), where U(zo) is a 
neighborhood of xo, then f(x) has a derivative (f‘")(xo)) of order n at zo. 


6. Give an example showing that the assumption that fT} be continuous at the 
point yo cannot be omitted from Theorem 3. 


7. a) Two bodies with masses mı and mz respectively are moving in space under 
the action of their mutual gravitation alone. Using Newton’s laws (formulas (5.1) 
and (5.2) of Sect. 5.1), verify that the quantity 


7172 


p= (jm + zmaž) + (- G 


5 5 Jak+u, 


where vı and v2 are the velocities of the bodies and r the distance between them, 
does not vary during this motion. 


b) Give a physical interpretation of the quantity E = K+U and its components. 


c) Extend this result to the case of the motion of n bodies. 
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5.3 The Basic Theorems of Differential Calculus 


5.3.1 Fermat’s Lemma and Rolle’s Theorem 


Definition 1. A point zo E€ E C R is called a local maximum (resp. local 
minimum) and the value of a function f : E — R at that point a local 
mazimum value (resp. local minimum value) if there exists a neighborhod 
Up(xo) of xp in E such that at any point x € Ug(xo) we have f(x) < f(zo) 
(resp. f(x) > f(£o)). 

Definition 2. If the strict inequality f(x) < f(xo) (resp. f(x) > f(zo)) 


(0) 
holds at every point x € Ug(xo) \ zo = Ug(xo), the point xo is called strict 
local maximum (resp. strict local minimum) and the value of the function 
f: E —> R a strict local mazimum value (resp. strict local minimum value). 


Definition 3. The local maxima and minima are called local extrema and 
the values of the function at these points local extreme values of the function. 


Example 1. Let 
r? if -l<2<2, 


f(z) = 
Ae It 2g 
(see Fig. 5.8). For this function 
x = —1 is a strict local maximum; 


x = 0 is a strict local minimum; 

x = 2 is a local maximum; 
the points x > 2 are all local extrema, being simultaneously maxima and 
minima, since the function is locally constant at these points. 


= N WO LL 


-1 0 1 2 32 
Fig. 5.8. 


Example 2. Let f(x) = sin t on the set E = R \ 0. 
The points xz = (z + 2k)", k € Z, are strict local maxima, and the 


points x = (-3+ Qk)", k € Z, are strict local minima for f(x) (see 
Fig. 4.1). 7 
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Definition 4. An extremum zo € E of the function f : E — R is called an 
interior extremum if zo is a limit point of both sets E_ = {x € E|x < xo} 
and E, = {x € E| x > xo}. 


In Example 2, all the extrema are interior extrema, while in Example 1 
the point x = —1 is not an interior extremum. 


Lemma 1. (Fermat). If a function f : E > R is differentiable at an interior 
extremum, zo € E, then its derivative at xo is 0 : f'(xo) = 0. 


Proof. By definition of differentiability at 29 we have 
f (xo +h) — f(xo) = f'(xo)h + a(z0;h)h , 


where a(xo; h) > 0 as h > xz, 29 +he E. 
Let us rewrite this relation as follows: 


f(xo +h) — f(z0) = [f (z0) + a(xo;h)|h (5.45) 


Since zo is an extremum, the left-hand side of Eq. (5.45) is either non- 
negative or nonpositive for all values of h sufficiently close to 0 and for which 
Lothe E. 

If f'(xo) Æ 0, then for h sufficiently close to 0 the quantity f'(xo)+a(xo; h) 
would have the same sign as f'(xo), since a(z9;h) > 0 as h > 0, ao +he E. 

But the value of h can be both positive or negative, given that zo is an 
interior extremum. 

Thus, assuming that f'(xo) Æ 0, we find that the right-hand side of (5.45) 
changes sign when h does (for h sufficiently close to 0), while the left-hand 
side cannot change sign when h is sufficiently close to 0. This contradiction 
completes the proof. O 


Remarks on Fermat’s Lemma 1°. Fermat’s lemma thus gives a necessary 
condition for an interior extremum of a differentiable function. For noninterior 
extrema (such as the point x = —1 in Example 1) it is generally not true 
that f' (xo) = 0. 


29, Geometrically this lemma is obvious, since it asserts that at an extremum 
of a differentiable function the tangent to its graph is horizontal. (After all, 
f'(xo) is the tangent of the angle the tangent line makes with the z-axis.) 


3°. Physically this lemma means that in motion along a line the velocity must 

be zero at the instant when the direction reverses (which is an extremum!). 
This lemma and the theorem on the maximum (or minimum) of a contin- 

uous function on a closed interval together imply the following proposition. 


Proposition 1. (Rolle’s!® theorem). If a function f : [a,b] + R is contin- 
uous on a closed interval [a,b] and differentiable on the open interval ja, b[ 
and f(a) = f(b), then there exists a point € €la,b| such that f’(€) = 0. 


10 M. Rolle (1652-1719) — French mathematician. 
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Proof. Since the function f is continuous on |a, b], there exist points £m, £m €E 
[a,b] at which it assumes its minimal and maximal values respectively. If 
f(tm) = f(xm), then the function is constant on [a,b]; and since in that 
case f'(x) = 0, the assertion is obviously true. If f(am) < f(£m), then, since 
f(a) = f(b), one of the points £m and zm must lie in the open interval Ja, bf. 
We denote it by €. Fermat’s lemma now implies that f’(£)=0. O 


5.3.2 The theorems of Lagrange and Cauchy on finite increments 


The following proposition is one of the most frequently used and important 
methods of studying numerical-valued functions. 


Theorem 1. (Lagrange’s finite-increment theorem). If a function f : [a,b] 
— R is continuous on a closed interval [a,b] and differentiable on the open 
interval |a, b[, there exists a point £ €]a,b| such that 


f() — f@) =f ©-a). (5.46) 
Proof. Consider the auxiliary function 
F(x) = f(a) -A aa), 


which is obviously continuous on the closed interval [a, b] and differentiable on 
the open interval Ja, b[ and has equal values at the endpoints: F(a) = F(b) = 
f(a). Applying Rolle’s theorem to F(x), we find a point €€]a, b[ at which 
| b) — fla 
rig = f(Q - 29-9 -o.n 


a 


Remarks on Lagrange’s Theorem 1° In geometric language Lagrange’s 
theorem means (see Fig. 5.9) that at some point (€, f(€)), where € €]a, bl, 


y 
f(b) 


f(a) 


Fig. 5.9. 
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the tangent to the graph of the function is parallel to the chord joining the 
points (a, f(a)) and (b, f(b)), since the slope of the chord equals £0) Fla) | 


—a 


2°. If x is interpreted as time and f(b) — f(a) as the amount of displacement 
over the time b—a of a particle moving along a line, Lagrange’s theorem says 
that the velocity f'(x) of the particle at some time € €]a, b| is such that if 
the particle had moved with the constant velocity f’(€) over the whole time 
interval, it would have been displaced by the same amount f(b) — f(a). It is 
natural to call f’(€) the average velocity over the time interval [a,b]. 


3°. We note nevertheless that for motion that is not along a straight line 
there may be no average speed in the sense of Remark 2°. Indeed, suppose 
the particle is moving around a circle of unit radius at constant angular 
velocity w = 1. Its law of motion, as we know, can be written as 


r(t) = (cost, sint) . 


Then 
r(t) = v(t) = (— sint, cost) 


and |v| = vsin? t + cos?¢ = 1. 
The particle is at the same point r(0) = r(27) = (1,0) at times t = 0 and 
t = 27, and the equality 


r(2m) — r(0) = v(f)(2m — 0) 


would mean that v(€) = 0. But this is impossible. 

Even so, we shall learn that there is still a relation between the displace- 
ment over a time interval and the velocity. It consists of the fact that the full 
length L of the path traversed cannot exceed the maximal absolute value of 
the velocity multiplied by the time interval of the displacement. What has 
just been said can be written in the following more precise form: | 


(5.47) 


As will be shown later, this natural inequality does indeed always hold. 
It is also called Lagrange’s finite-increment theorem, while relation (5.46), 
which is valid only for numerical-valued functions, is often called the Lagrange 
mean-value theorem (the role of the mean in this case is played by both the 
value f’(€) of the velocity and by the point € between a and b). 


4°. Lagrange’s theorem is important in that it connects the increment of a 
function over a finite interval with the derivative of the function on that 
interval. Up to now we have not had such a theorem on finite increments and 
have characterized only the local (infinitesimal) increment of a function in 
terms of the derivative or differential at a given point. 
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Corollaries of Lagrange’s Theorem 


Corollary 1. (Criterion for monotonicity of a function). If the derivative of 
a function is nonnegative (resp. positive) at every point of an open interval, 
then the function is nondecreasing (resp. increasing) on that interval. 


Proof. Indeed, if x; and x2 are two points of the interval and zı < zə, that 
is, £2 — Tı > 0, then by formula (5.46) 


f (x2) — f(x1) = f'(€)(w2 — 21) , where z1 <E< T2, 


and therefore, the sign of the difference on the left-hand side of this equality 
is the same as the sign of f’(€). O 


Naturally an analogous assertion can be made about the nonincreasing 
(resp. decreasing) nature of a function with a nonpositive (resp. negative) 
derivative. 


Remark. By the inverse function theorem and Corollary 1 we can conclude, 
in particular, that if a numerical-valued function f(x) on some interval J 
has a derivative that is always positive or always negative, then the function 
is continuous and monotonic on J and has an inverse function f—! that is 
defined on the interval J’ = f() and is differentiable on it. 


Corollary 2. (Criterion for a function to be constant). A function that is 
continuous on a closed interval [a,b] is constant on it if and only if its deriva- 
tive equals zero at every point of the interval [a,b] (or only the open interval 


Ja, b[). 


Proof. Only the fact that f'(x) = 0 on Ja, b/ implies that f(z1) = f(x) for 
all 21,2, € [a,b] is of interest. But this follows from Lagrange’s formula, 
according to which 


f (x2) — F(a) = f'(€)(w2 — 21) =0, 
since € lies between x, and 22, that is, € €]a, b|, and so f’(€)=0. o 


Remark. From this we can draw the following conclusion (which as we shall 
see, is very important for integral calculus): If the derivatives F{(x) and 
F3(x) of two functions F\(x) and F(x) are equal on some interval, that is, 
Fi (x) = F(x) on the interval, then the difference F\(x) — F(x) is constant. 


The following proposition is a useful generalization of Lagrange’s theorem, 
and is also based on Rolle’s theorem. 


Proposition 2. (Cauchy’s finite-increment theorem). Let x = x(t) and 
y = y(t) be functions that are continuous on a closed interval |a, 6] and 
differentiable on the open interval |a, B|. 
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Then there exists a point T € |a, B] such that 


x'(r)(y(B) — y(a)) = y'(7)(x(B) — z(a)) . 
If in addition x'(t) £ 0 for each t Ela, B|, then x(a) Æ x(8) and we have the 


equality 
y(B)—y(a) _ y'(7) 
x(B)— x(a) 2'(r) 


Proof. The function F(t) = x(t)(y(8)—y(a)) —y(t)(z(B) —x(a)) satisfies the 
hypotheses of Rolle’s theorem on the closed interval [a, 8]. Therefore there 
exists a point T Ela, 6| at which F’(7) = 0, which is equivalent to the equality 
to be proved. To obtain relation (5.48) from it, it remains only to observe 
that if x'(t) 40 on Ja, 8|, then z(a) 4 x(8), again by Rolle’s theorem. O 


(5.48) 


Remarks on Cauchy’s Theorem 1°. If we regard the pair x(t), y(t) as 
the law of motion of a particle, then (z'(t),y’(t)) is its velocity vector at 
time t, and (x(8) — z(a),y(8) — y(a)) is its displacement vector over the 
time interval [a, 6]. The theorem then asserts that at some instant of time 
T € [a, 8] these two vectors are collinear. However, this fact, which applies to 
motion in a plane, is the same kind of pleasant exception as the mean-velocity 
theorem in the case of motion along a line. Indeed, imagine a particle moving 
at uniform speed along a helix. Its velocity makes a constant nonzero angle 
with the vertical, while the displacement vector can be purely vertical (after 
one complete turn). 


2°. Lagrange’s formula can be obtained from Cauchy’s by setting «=x(t) =t, 
y(t) = y(x) = f(x), a=a, p =b. 


5.3.3 Taylor’s Formula 


From the amount of differential calculus that has been explained up to this 
point one may obtain the correct impression that the more derivatives of 
two functions coincide (including the derivative of zeroth order) at a point, 
the better these functions approximate each other in a neighborhood of that 
point. We have mostly been interested in approximations of a function in the 
neighborhood of a point by a polynomial P (x) = Pa(z£o;£) = co + cı (£ — 
Lo) +--+ +€n(x — Xo)”, and that will continue to be our main interest. We 
know (see Example 25 in Subsect. 5.2.6) that an algebraic polynomial can be 
represented as 


P' 


Basher 


Pp”) Xo) 
T (x — + ( 


n 
zo) ++: — y (2-10) ; 


that is, ck = Pa (z0) (k =0,1,...,n). This can easily be verified directly. 
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Thus, if we are given a function f(x) having derivatives up to order n 
inclusive at x9, we can immediately write the polynomial 


Ma (n) x 
P, (20322) = Pala) = fleo) + EY (e—ar9) +--+ 2) ea)" , (6.49) 


whose derivatives up to order n inclusive at the point xp are the same as the 
corresponding derivatives of f(x) at that point. 


Definition 5. The algebraic polynomial given by (5.49) is the Taylor! poly- 
nomial of order n of f(x) at xo. 


We shall be interested in the value of 
f(x) — Pa (x0; £) = rn(x0; £) (5.50) 


of the discrepancy between the polynomial P,(x) and the function f(z), 
which is often called the remainder, more precisely, the nth remainder or the 
nth remainder term in Taylor’s formula: 


fo (zo) 
n! 


(g= ro)” + Pn(X03 T) . 


(5.51) 
The equality (5.51) itself is of course of no interest if we know nothing 
more about the function r,(zo;x) than its definition (5.50). 
We shall now use a highly artificial device to obtain information on the 
remainder term. A more natural route to this information will come from the 
integral calculus. 


— zo) +: + 


Theorem 2. Ifthe function f is continuous on the closed interval with end- 
points xo and x along with its first n derivatives, and it has a derivative of 
order n + 1 at the interior points of this interval, then for any function y 
that is continuous on this closed interval and has a nonzero derivative at its- 
interior points, there exists a point € between xo and x such that 


rn (zo; z) = LD 2Co) pD (ey (a e) (5.52) 


gy! (E)n! 


Proof. On the closed interval J with endpoints x9 and x we consider the 


auxiliary function 
F(t) = f(x) — Pa(t; x) (5.53) 


of the argument t. We now write out the definition of the function F(t) in 
more detail: 


/ (n) 
F(t) = f(<)— |f) + fw (x—t)+:- + j W Cc hes (ae (5.54) 


11 B, Taylor (1685-1731) — British mathematician. 
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We see from the definition of the function F(t) and the hypotheses of the 
theorem that F is continuous on the closed interval J and differentiable at 
its interior points, and that 


F(t) =—|f'() - j is d LO =1) = mre =O) se 
mw (n+1) (n+1) 
$2 Siena + Oe _ ay ea 


Applying Cauchy’s theorem to the pair of functions F(t), y(t) on the 
closed interval I (see relation (5.48)), we find a point € between xo and z at 


which 
F(x) —F(xo) _ F'(é) 
plz) — plzo) pE) 
Substituting the expression for F” (€) here and observing from comparison 
of formulas (5.50), (5.53) and (5.54) that F(x) — F(zoọ) = 0 — F(ao) = 
—rn (z0; £), we obtain formula (5.52). O 


Setting y(t) = x — t in (5.52), we obtain the following corollary. 


Corollary 1. (Cauchy’s formula for the remainder term). 


ra (052) =f) O(@ — £)” (2 — 20) . (5.55) 


A particularly elegant formula results if we set y(t) = (x—t)”*+? in (5.52): 


Corollary 2. (The Lagrange form of the remainder). 


Sf (E) (x — xo)" t! . (5.56) 


Tn (T0; xr) = 


(n wai 


We remark that when zp = 0 Taylor’s formula (5.51) is often called 
MacLaurin’s formula.!? 
Let us consider some examples. 


Example 3. For the function f(x) = e” with xo = 0 Taylor’s formula has the 
form 


1 1 1, 
e” S £ + hal +: ee + Tn(0; x), (5.57) 
and by (5.56) we can assume that 
r (0; T) = eee „gtl 
n 9 (n a 1)! ? 


where |€| < |z|. 


12 C. MacLaurin (1698-1746) — British mathematician. 
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Thus 
ae 


CES ; (5.58) 


Irn (0; x)| = e «alt < 


ee 
(n+ 1)! 


But for each fixed x € R, if n — oo, the quantity Br as we know (see Ex- 
ample 12 of Subsect. 3.1.3), tends to zero. Hence it follows from the estimate 
(5.58) and the definition of the sum of a series that 


er Seige comm ees (5.59) 
for all x € R. 


Example 4. We obtain the expansion of the function a” for any a, 0 < a, 
a # 1, similarly: 


lna p 
Cae ee ese a ne 
n! 


Example 5. Let f(x) = sing. We know (see Example 18 of Subsect. 5.2.6) 
that f( (x) = sin (x+2n),n €N, and so by Lagrange’s formula (5.56) with 
zo = 0 and any x €E R we find 


ra(0; 2) = sin (£+ Zn +1) )a™**, (5.60) 


1 
(n+ 1)! 


from which it follows that r,(0;x) tends to zero for any x € R as n > oo. 
Thus we have the expansion 


: — l 3 1 5 (=1)* 2n+1 
Pees eee ae tg? UE nt qpe (5.61) 


for every x € R. 


Example 6. Similarly, for the function f(x) = cosx, we obtain 


1 T 
: = | ide n+1 
rn(0; x£) = CEE cos (£ +5 (n+ 1) )a (5.62) 
and i i 1)" 
Sy gee ie att aa peels Ne ay ae 
cos xz = 1 The + ne + Gn)! xr” + (5.63) 
for x € R. 


Example 7. Since sinh’ x = cosh z and cosh’ x = sinh z, formula (5.56) yields 
the following expression for the remainder in the Taylor series of f(x) = 
sinh z: 


ra(O52) = FO (Catt, 


(n+ 1)! 


5.3 The Basic Theorems of Differential Calculus 223 


where f("+))(€) = sinh€ if n is even and f("t)(€) = coshé if n is odd. 
In any case |f("+1)(€)| < max {|sinh z|, | cosh z|}, since |€| < |x|. Hence for 
any given value x € R we have r,,(0;z) — 0 as n > ov, and we obtain the 
expansion 


1 


sinh z = x + a 2° F aa qie ao Ione 4 (5.64) 
valid for all x € R. 
Example 8. Similarly we obtain the expansion 
cosh = 1+ Ea? b ++ gr peee (5.65) 
2! 4! (2n)! 


valid for any x € R. 


Example 9. For the function f(x) = In(1+z) we have f™) (x) = oe, 
so that the Taylor series of this function at rp = 0 is ; 


1 = n—1 
Bic gece D 
n 


1 
ln(1 +x) =x -— zT + 32 eo” + 7n(0;2x). (5.66) 


This time we represent r,,(0; x) using Cauchy’s formula (5.55): 


= ATT)" (o —6)"2, 


Tn(0; x) = nte 


or 


stasa (ERY, (5.67) 


where € lies between 0 and z. 
If |x| < 1, it follows from the condition that £ lies between 0 and z that 


1— 1 — 
gzj- E-K inl yale gy ioll a. 09 


Eas) = ST laka t= 
Thus for |x| < 1 
Irn (0; x)| < |e t}, (5.69) 
and consequently the following expansion is valid for |x| < 1: 
1 1 (—1)""! 
In( E a E ie Ns E ; 
n(l+r)=zx i +z + = + (5.70) 


We remark that outside the closed interval |x| < 1 the series on the right- 
hand side of (5.70) diverges at every point, since its general term does not 
tend to zero if |z| > 1. 
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Example 10. For the function (1 + 2)%, where a € R, we have f(z) = 
a(a—1)---(a—n+1)(1+2)°, so that Taylor’s formula at zo = 0 for this 
function has the form 


a Q a(a—1) 5 
(1+ 2) =1l+72+—y7 * aes 
we ee ee 1 
Oe tec oi re A (5.71) 


n! 


Using Cauchy’s formula (5.55), we find 


r,(0;2) = a 


(pee? (=o) as, (5.72) 
where € lies between 0 and z. 
If |x| < 1, then, using the estimate (5.68), we have 


ra(0;2)| < |e(1-S)---(-=)la+ee ter. (5.73) 


When n is increased by 1, the right-hand side of Eq. (5.73) is multiplied 
by |(1 — ;%)a|. But since |z| < 1, we shall have |(1 — z) z| <q<1, 
independently of the value of a, provided |z| < q < 1 and n is sufficiently 
large. 

It follows from this that r,(0;2) —> 0 as n > oo for any a € R and any 
x in the open interval |x| < 1. Therefore the expansion obtained by Newton 


(Newton’s binomial theorem) is valid on the open interval |z| < 1: 


mae So U p OO ED es OH 
1! 2! n! 

We remark that d’Alembert’s test (see Paragraph b of Subsect. 3.1.4) 
implies that for |x| > 1 the series (5.74) generally diverges if a ¢ N. Let us 
now consider separately the case when Q = n € N. | 

In this case f(z) = (1+2z)* = (1 + x)” is a polynomial of degree n 
and hence all of its derivatives of order higher than n are equal to 0. There- 
fore Taylor’s formula, together with, for example, the Lagrange form of the 
remainder, enables us to write the following equality: 


(n-1 5) ies 
(+2)P=14 Se4 MOOD... Me Dn (5.75) 


which is the Newton binomial theorem known from high school for a natural- 
number exponent: 


a+ayrm14+(T)er (> jer + (7) ar 
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Thus we have defined Taylor’s formula (5.51) and obtained the forms 
(5.52), (5.55), and (5.56) for the remainder term in the formula. We have 
obtained the relations (5.58), (5.60), (5.62), (5.69), and (5.73), which enable 
us to estimate the error in computing the important elementary functions 
using Taylor’s formula. Finally, we have obtained the power-series expansions 
of these functions. 


Definition 6. If the function f(z) has derivatives of all orders n € N at a 
point xo, the series 


f (zo) + = S'eo) — zo) +- + =F (ao)(a a) ee eee 


is called the Taylor series of f at the point zo. 


It should not be thought that the Taylor series of an infinitely differen- 
tiable function converges in some neighborhood of xo, for given any sequence 
Co,C1,--+,€n,-.- of numbers, one can construct (although this is not simple 
to do) a function f(x) such that f (x0) = cn, n E€ N. 

It should also not be thought that if the Taylor series converges, it neces- 
sarily converges to the function that generated it. A Taylor series converges 
to the function that generated it only when the generating function belongs 
to the class of so-called analytic functions. 

Here is Cauchy’s example of a nonanalytic function: 


ele" ife 40, 
f(x) = 
0- ara]. 


Starting from the definition of the derivative and the fact that 
tke-1/2" — 0 as x —> 0, independently of the value of k (see Example 30 
in Sect. 3.2), one can verify that f'")(0) = 0 for n = 0,1,2,.... Thus, the 
Taylor series in this case has all its terms equal to 0 and hence its sum is 
identically equal to 0, while f(x) 4 0 if x £0. 

In conclusion, we discuss a local version of Taylor’s formula. 

We return once again to the problem of the local representation of a 
function f : E — R by a polynomial, which we began to discuss in Subsect. 
5.1.3. We wish to choose the polynomial Pa (zo; £) = £o + ¢1(@ — £to) +--+ + 
Cn(£ — Zo)” so as to have 


f(x) = P,(x) + o((x — 20)") as z > to TEE, 
or, in more detail, 


f(x) =coter(@ — zo) +--- + Cn(£ — zo)” + o((x — zo)”) 
as x —> To, £ E E . (5.76) 


We now state explicitly a proposition that has already been proved in all 
its essentials. 
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Proposition 3. If there exists a polynomial P,(xo; x£) = co + cı (£ — zo) + 
---+Cn(x— xo)” satisfying condition (5.76), that polynomial is unique. 


Proof. Indeed, from relation (5.76) we obtain the coefficients of the polyno- 
mial successively and completely unambiguously 


Co = liMEbzr>zro f(z) ’ 
— i f(x)—c 
cq, = limgsz-+z5 Helo , 
f(æ)- [co+-+en-1(2-20)”7:] 
Cn = limgsz-+zo -= eo å O 


We now prove the local version of Taylor’s theorem. 


Proposition 4. (The local Taylor formula). Let E be a closed interval hav- 
ing xo € R as an endpoint. If the function f : E —> R has derivatives 
f'(xo),..., f (zo) up to order n inclusive at the point xo, then the following 
representation holds: 


f(x) = flo) + LF (a — a9) +--+ 


+o((x—20)") as x > zo, x € E . (5.77) 


(n) 
f mo) (£ — 2)” + 


Thus the problem of the local approximation of a differentiable function 
is solved by the Taylor polynomial of the appropriate order. 

Since the Taylor polynomial P, (z0; x) is constructed from the requirement 
that its derivatives up to order n inclusive must coincide with the correspond- 
ing derivatives of the function f at xo, it follows that f‘*) (xo) — PfP (£0; Zo) = 
0 (k =0,1,...,n) and the validity of formula (5.77) is established by the fol- 
lowing lemma. 


Lemma 2. If a function y : E — R, defined on a closed interval E with 
endpoint xo, is such that it has derivatives up to order n inclusive at xo and 
p(zo) = p'(£0) = --- = p™) (xo) = 0, then v(x) = o((x — x0)”) as x > xo, 
LEE. 


Proof. For n = 1 the assertion follows from the definition of differentiability 
of the function y at zo, by virtue of which 


p(x) = y(xo) + p'(xo)(x — zo) + o(x — zo) as t > to LEE, 
and, since (xo) = y'(xzo) = 0, we have 
p(x) = o(x — zo) as tT > to, LEE. 


Suppose the assertion has been proved for order n = k — 1 > 1. We shall 
show that it is then valid for order n = k > 2. 
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We make the preliminary remark that since 


; Ta hye ee 
OE AN = (0s, lim L (x) — p= (£o) 
ps"? (x0) (Y ) (Zo) Pere o OO 


the existence of y"*) (xo) presumes that the function y*—)) (x) is defined on 
E, at least near the point xo. Shrinking the closed interval FE if necessary, 
we can assume from the outset that the functions y(x), g'(x), ..., p%-) (2), 
where k > 2, are all defined on the whole closed interval E with endpoint xo. 
Since k > 2, the function y(x) has a derivative y’(x) on E, and by hypothesis 


(p) (to) =- = (W)? (z0) = 0. 
Therefore, by the induction assumption, 
y'(x) = o((x — to)*—*) as rt > to, TEE. 
Then, using Lagrange’s theorem, we obtain 


p(z) = p(x) — (z0) = p' (E(x — £0) = a(€)(E — zo)? (x — x0) , 


where € lies between xp and zx, that is, |E — zo| < |x — ro|, and a(¢) > 0 as 
E > xo, E € E. Hence as xt > Xo, x E E, we have simultaneously € —> Zo, 
€ € E, and a(€) > 0. Since 


lp(z)| < la(£)| |£ — zo|*~*|z — zol , 
we have verified that 
v(x) = o((x — zo)") as x > to, £ E E. 


Thus, the assertion of Lemma 2 has been verified by mathematical induc- 
tion. O 


Relation (5.77) is called the local Taylor formula since the form of the 
remainder term given in it (the so-called Peano form) 


Tn(z0; £) = o((x — Zo)" ) , (5.78) 


makes it possible to draw inferences only about the asymptotic nature of 
the connection between the Taylor polynomial and the function as x > Zo, 
LEE. 

Formula (5.77) is therefore convenient in computing limits and describing 
the asymptotic behavior of a function as x > 2, x € E, but it cannot help 
with the approximate computation of the values of the function until some 
actual estimate of the quantity r, (xo; x) = o((x — xo)” ) is available. 
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Let us now summarize our results. We have defined the Taylor polynomial 


"(a (2) (a 
FASO) e — 9) po p PD fo — yy, 


Pr(£0; 2) = f (£0) + 
written the Taylor formula 


f’ (Zo) 
T 


(x — £0)” + Tn(03 2) , 


£ — zo) +--+ 


7 f™ (zo) 
ie) a 
and obtained the following very important specific form of it: 

If f has a derivative of order n+ 1 on the open interval with endpoints 
xo and x, then 


‘(x (n(x 
f(x) = f (xo) fmo) 2G) tF FTO e ay)" + 
FDE n+i1 
ae ee 


where E is a point between xo and z. 
If f has derivatives of orders up ton > 1 inclusive at the point xo, then 


(nir 
f(x) = flo) + ew fir o) (x—zro)+:: + 9) (5 96)" +0((2—20)") . (5.80) 


Relation (5.79), called Taylor’s formula with the Lagrange form of the 
remainder, is obviously a generalization of Lagrange’s mean-value theorem, 
to which it reduces when n = 0. 

Relation (5.80), called Taylor’s formula with the Peano form of the re- 
mainder, is obviously a generalization of the definition of differentiability of 
a function at a point, to which it reduces when n = 1. 

We remark that formula (5.79) is nearly always the more substantive of 
the two. For, on the one hand, as we have seen, it enables us to estimate 
the absolute magnitude of the remainder term. On the other hand, when, for 
example, f("+1)(x) is bounded in a neighborhood of 29, it also implies the 
asymptotic formula 


f(x) = f(#0) + 


| (n) 
Ie To — xo) ++ + D = (x — zo)” + O((x —20)"*") . 
| (5.81) 
Thus for infinitely differentiable functions, with which classical analysis deals 
in the overwhelming majority of cases, formula (5.79) contains the local for- 
mula (5.80). 
In particular, on the basis of (5.81) and Examples 3-10 just studied, we 
can now write the following table of asymptotic formulas as x — 0: 
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1 1 1 
e = 1+ r+ ir H. HR T FOUE), 


1! 2! 
7 1 1 (—1)” A Paa 
coss = 1- 5a B = mee +O(x ), 
: 1 1 (—1)” 
ea a er + (e748), 
cosha = 1+ ee + La +. 1 r?” ate O(n" *) 
2! A! + Gn)! 
sinha = x + . Lg nee ee p— | pan + O(x?”+3) 
3! 5! One 1)! i 
1 1 —]1)” 
ln(1 +x) = x — z7 + ar DEE aa a r” + O(2"*") 


-1 =l 1 
(Lpg = I+ Žr ON ong: MOD toa pn 


+ O(a"). 
Let us now consider a few more examples of the use of Taylor’s formula. 


Example 11. We shall write a polynomial that makes it possible to compute 
the values of sing on the interval —1 < x < 1 with absolute error at most 
1078. 

One can take this polynomial to be a Taylor polynomial of suitable degree 
obtained from the expansion of sin z in a neighborhood of xp = 0. Since 


= 1 l s ee ae 1)” enti 2n+2 : 
sinz = x — T T Ale Sea T (Qn + 1!” +0-2 + r2n+2(0; x) ’ 


where by Lagrange’sformula 


sin (£ + 52n + 3)) ae 


T2n4+2 (0; xz) = On. m 3)! 


we have, for |x| < 1, 


1 
n 0; < Za 1 9\t? 
[r2 +2( x)| (2n +3)! 


But EE < 107° for n > 2. Thus the approximation sin z ~ x — $ + 42° 


has the required precision on the closed interval |x| < 1. 


Example 12. We shall show that tan x = x + ir’ +o(x) as x — 0. We have 


tan’x = cos “2, 


tan” x = 2cos ° zsinz , 


—4 2 


tan” x = 6cos~* xsin? x + 2cos~* 2. 


Thus, tan0 = 0, tan’ 0 = 1, tan” 0 = 0, tan’” 0 = 2, and the relation now 
follows from the local Taylor formula. 
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Example 13. Let a > 0. Let us study the convergence of the series 
CO 

5 Incos L. For œ > 0 we have t — 0 as n — oo. Let us estimate the 
n=1 

order of a term of the series: 


nee = In (1- OREL +o(=)) = E PES +0(—-) 
no 2) nr nee 2 ne nea 
Thus we have a series of terms of constant sign whose terms are equivalent 
to those of the series > zA. Since this last series converges only for a > T 
when a > 0 the see al series converges only for œ > $ (see Problem 15b) 
below). 


Example 14. Let us show that Incosz = —352? — 424 — 2° + O(2%) as 


x — 0. 
This time, instead of computing six successive derivatives, we shall use 
the already-known expansions of cos x as x — 0 and ln(1 + u) as u > 0: 


1 1 1 
ln cosx = In (1 Sg ge ge ee O(z°)) = ln(1 + u) = 


2! 4! 6! 
_,_ 12,13 a,_f{_i1.2,14 1 6 8)\ _ 
=u-5u + au + O(ut) = ( aye + at aie + O(2°)) 
lj 1 4 l ¢ D 1 l ¢ 5)) = 
ar? age POD or aise ORS 
1 
= —=2" ot — +26 + O(28). 


Example 15. Let us find the values of the first six derivatives of the function 
ln cosx at x = 0. 

We have (Incos)’s = =", and it is therefore clear that the function 
has derivatives of all orders at 0, since cos0 # 0. We shall not try to find 
functional expressions for these derivatives, but rather we shall make use 
of the uniqueness of the Taylor polynomial and the result of the preceding 
example. 


If 


f(z) = co + ciz +-+- + cent” + o(a") asx 0, 
then 


(k) 
Ck = J a and f) (0) = k!cp . 


Thus, in the present case we obtain 


(Incos)(0) =0, (Incos)’(0)=0, (ln cos)”(0) = -5 QI, 
1 
12 


(In cos) (0) = 0, (Incos)®) (0) = -> -6!. 


(In cos)*)(0) = 0, (lIncos)® (0) = —— - 4!, 
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Example 16. Let f(x) be an infinitely differentiable function at the point xo, 
and suppose we know the expansion 


f(x) =h Art. Hear 4+ Olt) 


of its derivative in a neighborhood of zero. Then, from the uniqueness of the 
Taylor expansion we have 


(f’)(0) = k! , 


and so f(¥+1)(0) = k!c!. Thus for the function f(z) itself we have the expan- 
sion 

| > lie Ic! 
f(x) = f(0) + Ba + Se? 4... 4 a 


Tar) ae TOUM 


or, after simplification, 


= Co C1 2 A n+1 n+2 
F(z) = F0) + FEto + sae fd + O(2""*) . 
Example 17. Let us find the Taylor expansion of the function f(x) = arctan x 
at 0. 
Since f'(x) = py = (1+2?) t = 1-2? +a*%—----+(—-1)"2*" +0(e7""?), 
by the considerations explained in the preceding example, 


f(z) = f(0) + = 1 3 F Lj Seyer (=) gert y O(x?”+3) 
l 3 ð 2n +1 
that is, 
arctan x = x — p + a Derri (—1)" rtl 4. O(q2n+3) 


Example 18. Similarly, by expanding the function arcsin’ x = (1 — 2?)~1/? 


_ by Taylor’s formula in a neighborhood of zero, we find successively, 


2 
~}(=$=1)s(-p-n4) 


a u” + O(u"t") , 
n! 
1 1-3 
M= i a e a 
Led (2n — 1) on „2n+2 
f l 3 1-3 E 
AGREE + 53-51-52 estore 
(2n — 1)!! 


ite gerti Ñ OG) 


(2n)!(2n + 1) 
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or, after elementary transformations, 


ag BB? (2n — 1)? 
ATOR aT Uo i RRE aay 


Here (2n — 1)!! := 1 -3 --- (2n — 1) and (2n)!! := 2-4- -- (2n). 


gerti 4+ O(x?”+t3) , 


Example 19. We use the results of Examples 5, 12, 17, and 18 and find 


= arctan’ — sing _ i [x — 32° + O(x*)| — [x — 42° + O(z5)] 

z+0tanx—arcsing 20 |z + 423+ O(r5)| — [x + 4x3 + O(2>)| 
-4x + O(2*) _ 

z+0 z3 4+O(r>) p 


5.3.4 Problems and Exercises 


1. Choose numbers a and b so that the function f(x) = cosx — lrans is an in- 


finitesimal of highest possible order as x —> 0. 
2. Find lim z|? - (<4) |. 


3. Write a Taylor polynomial of e” at zero that makes it possible to compute the 
values of e” on the closed interval —1 < x < 2 within 107°. 


4. Let f be a function that is infinitely differentiable at 0. Show that 
a) if f is even, then its Taylor series at 0 contains only even powers of z; 
b) if f is odd, then its Taylor series at 0 contains only odd powers of zx. 


5. Show that if f e C) [—1, 1] and f™ (0) = 0 for n = 0,1,2,..., and there exists 


a number C such that sup |f™ (x)| < n!C for n EN, then f = 0 on [—1, 1]. 
—1<x<1 


6. Let f € C™ ( — 1, 1) and sup |f(x)| <1. Let m:(Z) = inf |f™ (x)|, where 
—l<2<1 wel 
I is an interval contained in | — 1, 1[. Show that 


a) if J is partitioned into three successive intervals Iı, I2, and J3 and u is the 
length of I2, then 


1 
mx(L) < z (ma-a (h) + mx—1(Is))) ; 
b) if J has length A, then 
gk(k+1)/2 pk 
we? 


c) there exists a number a, depending only on n such that if | f’(0)| > an, then 
the equation f™ (x) = 0 has at least n — 1 distinct roots in ] — 1, 1. 


mx (I) < 


Hint: In part b) use part a) and mathematical induction; in c) use a) and prove 
by induction that there exists a sequence £k) < Leg < ++: < Xk, Of points of the 
open interval ] — 1,1[ such that f“)(rz,) +f (£k) <0 for 1<i<k-1. 
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7. Show that if a function f is defined and differentiable on an open interval J and 
[a,b] C I, then 

a) the function f'(x) (even if it is not continuous!) assumes on [a,b] all the 
values between f'(a) and f'(b) (the theorem of Darboux)'?; 


b) if f(x) also exists in Ja, b[, then there is a point € €]a, b[ such that f'(b) — 
f'(a) = F” (E)(b — a). 
8. A function f(x) may be differentiable on the entire real line, without having a 
continuous derivative f'(x) (see Example 7 in Subsect. 5.1.5). 

a) Show that f'(x) can have only discontinuities of second kind. 


b) Find the flaw in the following “proof” that f'(x) is continuous. 


Proof. Let xo be an arbitrary point on R and f’(xo) the derivative of f at the point 
xo. By definition of the derivative and Lagrange’s theorem 


f(z) — f(z) _ y 


/ = li 
ee 


m f€ = Jim OF 


LT 
where € is a point between xo and x and therefore tends to xp as x —> xo. O 
9. Let f be twice differentiable on an interval J. Let Mo = sup|f(x)|, Mı = 
xel 
sup |f’ (x)| and M2 = sup |f” (x)|. Show that 
xEI xEI 
a) if I = [—a, a], then 


r? -+ a? 
2a 


M, 
ols M: 


Mı < 2V MoM2 , if the length of I is not less than 2,/Mo/M2 , 
b) 
Mı < V 2Mo M2 ; if I = R; 
c) the numbers 2 and v2 in part b) cannot be replaced by smaller numbers; 
d) if f is differentiable p times on R and the quantities Mo and Mp = 
sup|f® (x)| are finite, then the quantities Mẹ = sup|f™® (x)|, 1 < k < p, are 
' xER xER 
also finite and 
M; < gh(p—k)/2 yg h— iP kip 
Hint: Use Exercises 6b) and 9b) and mathematical induction. 
10. Show that if a function f has derivatives up to order n + 1 inclusive at a point 


zo and fF) (x0) Æ 0, then in the Lagrange form of the remainder in Taylor’s 
formula 


ralto t= aso (zo + O(x — 0) (x — zo)" , 


where 0 < 0 < 1 and the quantity 0 = 0(x) tends to aT as £T — Xo. 


13 G. Darboux (1842-1917) — French mathematician. 
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11. Let f be a function that is differentiable n times on an interval J. Prove the 
following statements. 

a) If f vanishes at (n + 1) points of J, there exists a point € € J such that 
f° (€) =0. 

b) If x1, 22,..., £p are points of the interval J, there exists a unique polynomial 
L(x) (the Lagrange interpolation polynomial) of degree at most (n — 1) such that 
f(ai) = L(x), i = 1,...,n. In addition, for x € I there exists a point € € I such 
that jea ) 

£ — £1): (£ — Tn) wn 
f(x) -— L(x) = <<). 


n! 
c) If m1 < T2 < ++: < Tp are points of J and ni, 1 <i < p, are natural numbers 
such that nı + n2 +---+np =n and f*)(x;) = 0 for 0 < k < ni — 1, then there 
exists a point € in the closed interval [x1, £p] at which fT} (£) = 0. 


d) There exists a unique polynomial H(z) (the Hermite interpolating polyno- 
mial)'* of degree (n—1) such that f)(x;) = H® (x;) for 0 < k < ni—1. Moreover, 
inside the smallest interval containing the points x and x;, i = 1,...,p, there is a 
point € such that 


= PUL sac = np 
f(a) = Hla) + ET e T pe gy, 

This formula is called the Hermite interpolation formula. The points x;, i = 
1,...,p, are called interpolation nodes of multiplicity n; respectively. Special cases 
of the Hermite interpolation formula are the Lagrange interpolation formula, which 
is part b) of this exercise, and Taylor’s formula with the Lagrange form of the 
remainder, which results when p = 1, that is, for interpolation with a single node 
of multiplicity n. 


12. Show that 


a) between two real roots of a polynomial P(x) with real coefficients there is a 
root of its derivative P’ (x); 

b) if the polynomial P(x) has a multiple root, the polynomial P’(x) has the 
same root, but its multiplicity as a root of P’(x) is one less than its multiplicity as 
a root of P(x); 


c) if Q(x) is the greatest common divisor of the polynomials P(x) and P’(z), 


where P’(x) is the derivative of P(x), then the polynomial aes has the roots of 


P(x) as its roots, all of them being roots of multiplicity 1. 


13. Show that 


a) any polynomial P(x) admits a representation in the form co + c1(z — £o) + 
ben t= 20)”; 


b) there exists a unique polynomial of degree n for which f(x) — P(x) = o( (x — 


zo)” ) as E > x — xo. Here f is a function defined on a set E and Zo is a limit 


point of E. 


14 Ch. Hermite (1822-1901) — French mathematician who studied problems of anal- 
ysis; in particular, he proved that e is transcendental. 
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14. Using induction on k, 1 < k, we define the finite differences of order k of the 
function f at xo: 
A f (x0; h1) := Af (ao; h1) = f (£o + hi) — f (2x0) , 
A’ f (x0; hi, he) := AAF (x0; hi, h2) = 
= (Ff (20 + hı + h2) — f(#o + ha)) — (F(20 +hi)- f(zo)) = 
= f(xo + hı + ha) — f (£o + hı) — f (xo + h2) + f(xo) , 


Or RRR a 


A* f (zo; ha, os hx) = A*~* gi.(xo; ha, oe ., hk-1) ’ 


where gx (x) = A’ f(x; hk) = f(x + he) — f(z). 

a) Let f € CY [a,b] and suppose that f™ (x) exists at least in the open 
interval Ja, b[. If all the points xo, xo +hi, £o + h2, Lo+hithe,...,cothit:::+Ahn 
lie in [a,b], then inside the smallest closed interval containing all of them there is a 
point € such that 


A” f(xojhi,...,hn) =f (Ehi -hn . 
b) (Continuation.) If f‘” (xo) exists, then the following estimate holds: 
A” f (xo; hı, aag hn) _ f™ (zo)hi 7 hn| < 


< sup | f(a) — f™(zo)| + [Aa] +>- [al . 


= xE]a,b[ 


c) (Continuation.) Set A” f (xo; h, ..., h) =: A” f (£0; h”). Show that if f (xo) 
exists, then 
A” f (£0; h”) 
he 
d) Show by example that the preceding limit may exist even when f™ (xo) does 
not exist. 
Hint: Consider, for example, A? f (0; h?) for the function 


(n) =Ë 
f`” (zo) = lim 


Ben 4 
xr sin= ,x#0, 


f(z) = 
0, TS; 
and show that A” í 2) 
: : fO; ho) 
a N 


15. a) Applying Lagrange’s theorem to the function — where a > 0, show that 


the inequality 
Le en EE 
nite “a\(n—-1)* na 


holds for n € N anda > Q. 


OO 
b) Use the result of a) to show that the series )> - converges for o > 1. 
n=l 
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5.4 The Study of Functions Using the Methods 
of Differential Calculus 


5.4.1 Conditions for a Function to be Monotonic 


Proposition 1. The following relations hold between the monotonicity prop- 
erties of a function f : E — R that is differentiable on an open interval 
Ja, b||= E and the sign (positivity) of its derivative f’ on that interval: 


f'(x) >0=> fis increasing => f'(x)>0, 
f'(x) > 0 => f is nondecreasing => f'(x) >0, 
f'(z)=05 f = const. => f(r) =0, 
f'(x) <0 = f is nonincreasing > f'(x) <0, 
f'(z) <0=> f is decreasing => f'(x) <0. 


Proof. The left-hand column of implications is already known to us from 
Lagrange’s theorem, by virtue of which f(x2)— f(r1) = f’(€)(a2—21), where 
£1, £2 €la,b[ and € is a point between x; and 22. It can be seen from this 
formula that for xı < x2 the difference f(x2) — f(x1) is positive if and only 
if f’(E) is positive. | 

The right-hand column of implications can be obtained immediately from 
the definition of the derivative. Let us show, for example, that if a function 
f that is differentiable on Ja, b| is increasing, then f'(x) > 0 on Ja, b[. Indeed, 


=e Soe) 
h—-0 h 
If h >0, then f(x +h) — f(x) > 0; and if h < 0, then f(x +h) — f(x) <0. 
Therefore the fraction after the limit sign is positive. 
Consequently, its limit f'(x) is nonnegative, as asserted. O 


F (2) 


Remark 1. It is clear from the example of the function f(x) = z3 that a 
strictly increasing function has a nonnegative derivative, not necessarily one 
that is always positive. In this example, f’(0) = sT R = 0. 


Remark 2. In the expression A => B, as we noted at the appropriate point, 
A is a sufficient condition for B and B a necessary condition for A. Hence, 
one can make the following inferences from Proposition 1. 

A function is constant on an open interval if and only if its derivative is 
identically zero on that interval. 

A sufficient condition for a function that is differentiable on an open 
interval to be decreasing on that interval is that its derivative be negative at 
every point of the interval. 

A necessary condition for a function that is differentiable on an open in- 
terval to be nonincreasing on that interval is that its derivative be nonpositive 
on the interval. 
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Example 1. Let f(x) = x3 — 3x + 2 on R. Then f'(x) = 3x? — 3 = 3(x? — 1), 
and since f'(x) < 0 for |x| < 1 and f'(x) > 0 for |x| > 1, we can say that the 
function is increasing on the open interval | — co, —1|, decreasing on | — 1, 1], 
and increasing again on |1, +col. 


5.4.2 Conditions for an Interior Extremum of a Function 


Taking account of Fermat’s lemma (Lemma 1 of Sect. 5.3), we can state the 
following proposition. 


Proposition 2. (Necessary conditions for an interior extremum). In order 
for a point xo to be an extremum of a function f : U(xo) —> R defined on 
a neighborhood U (xo) of that point, a necessary condition is that one of the 
following two conditions hold: either the function is not differentiable at xo 


or f'(xo) = 0. 
Simple examples show that these necessary conditions are not sufficient. 


Example 2. Let f(x) = x? on R. Then f’(0) = 0, but there is no extremum 
at Xp = 0. 


Example 3. Let 
x forx>0O, 
f(x) = 


2x forx <0. 


This function has a bend at 0 and obviously has neither a derivative nor 
an extremum at 0. 


Example 4. Let us find the maximum of f(x) = x? on the closed interval 


[—2, 1]. It is obvious in this case that the maximum will be attained at the 
endpoint —2, but here is a systematic procedure for finding the maximum. We 
find f'(x) = 2x, then we find all points of the open interval | — 2, 1[ at which 
- f'(x) = 0. In this case, the only such point is x = 0. The maximum of f(z) 
must be either among the points where f'(x) = 0, or at one of the endpoints, 
about which Proposition 2 is silent. Thus we need to compare f(—2) = 4, 
f(0) = 0, and f(1) = 1, from which we conclude that the maximal value of 
f(x) = x? on the closed interval [—2, 1] equals 4 and is assumed at —2, which 
is an endpoint of the interval. 


Using the connection established in Subsect. 5.4.1 between the sign of 
the derivative and the nature of the monotonicity of the function, we arrive 
at the following sufficient conditions for the presence or absence of a local 
extremum at a point. 
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Proposition 3. (Sufficient conditions for an extremum in terms of the first 
derivative). Let f : U(xo) — R be a function defined on a neighborhood 
U(xo) of the point xo, which is continuous at the point itself and differentiable 


in a deleted neighborhood U (£0). Let U- (zo) = {x € U(xzo)|£ < zo} and 


U+ (xo) = {x z U(xo)| x > xo}. 
Then the following conclusions are valid: 


a) (Wx € U- (xo) (f/(x) < 0)) A (Yz € U+ (z0) (f(x) < 0)) > 
= (f has no extremum at x0); 


b) (Vz € U- (zo) (F"(a) <0))A (Yz € Ut (ao) (f'(z) > 0)) > 
=> (xo is a strict local minimum of f); 


c) (Yz € U~ (z0) (F'(£) > 0)) A (Wx € U+ (ao) (f"(x) < 0)) > 
= (xo is a strict local maximum of f); 


d) (vz € Ù- (z0) (F'(£) > 0)) A (Yz € U+ (z0) (f'(£) > 0)) > 
= (f has no extremum at xo). 


Briefly, but less precisely, one can say that if the derivative changes sign 
in passing through the point, then the point is an extremum, while if the 
derivative does not change sign, the point is not an extremum. 

We remark immediately, however, that these sufficient conditions are not 
necessary for an extremum, as one can verify using the following example. 


Example 5. Let 
2g? + x? sin + fort #0, 
f(z) = 
0 torg =0. 


Since x? < f(x) < 22, it is clear that the function has a strict local 
minimum at xo = 0, but the derivative f'(x) = 4x + 2x sin + — cos = is not of 
constant sign in any deleted one-sided neighborhood of this point. This same 
example shows the misunderstandings that can arise in connection with the 


abbreviated statement of Proposition 3 just given. 


We now turn to the proof of Proposition 3. 


O 
Proof. a) It follows from Proposition 2 that f is strictly decreasing on U~ (xo). 
Since it is continuous at zo, we have lim f(x) = f(zo), and conse- 
Ù —(zo)ÐT—> To 


quently f(x) > f(xo) for x € U~ (xo). By the same considerations we have 


f(xo) > f(x) for x € yt (xo). Thus the function is strictly decreasing in the 
whole neighborhood U (xo) and zo is not an extremum. 
b) We conclude to begin with, as in a), that since f(x) is decreasing on 


U- (xo) and continuous at xo, we have f(x) > f(xo) for x € U- (£o). We 
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conclude from the increasing nature of f on yt (xo) that f(xo) < f(x) for 


(e) 
x E€ U+ (xo). Thus f has a strict local minimum at zo. 
Statements c) and d) are proved similarly. O 


Proposition 4. (Sufficient conditions for an extremum in terms of higher- 
order derivatives). Suppose a function f : U (xo) —> R defined on a neighbor- 
hood U (xo) of xo has derivatives of order up to n inclusive at xo (n > 1). 

If f'(£o) = «++ = fY (zo) = 0 and f™ (zo) Æ 0, then there is no 
extremum at xo if n is odd. If n is even, the point xo is a local extremum, 
in fact a strict local minimum if f™) (xo) > 0 and a strict local marimum if 
f™ (£o) <0. 


Proof. Using the local Taylor formula 
f(z) — f(£0) = f (z0)(z — z0)” + a(z)(z — zo)” , (5.82) 


where a(x) — 0 as x — Zo, we shall reason as in the proof of Fermat’s lemma. 
We rewrite Eq. (5.82) as 


f(x) — f(zo) = (f™ (zo) + a(x)) (z — zo)” . (5.83) 


Since f™ (xo) 4 0 and‘a(xr) > 0 as z > zo, the sum f™ (zo) + a(x) has 
the sign of f™ (xo) when z is sufficiently close to zo. If n is odd, the factor 
(x — xo)” changes sign when x passes through Zo, and then the sign of the 
right-hand side of Eq. (5.83) also changes sign. Consequently, the left-hand 
side changes sign as well, and so for n = 2k + 1 there is no extremum. 

If n is even, then (x — x)” > 0 for x Æ zo and hence in some small 
neighborhood of Zo the sign of the difference f(x) — f(xo) is the same as the 
sign of f) (xo), as is clear from Eq. (5.83). O 


Let us now consider some examples. 


_ Example 6. The law of refraction in geometric optics (Snell’s law).!5 Accord- 
ing to Fermat’s principle, the actual trajectory of a light ray between two 
points is such that the ray requires minimum time to pass from one point to 
the other compared with all paths joining the two points. 

It follows from Fermat’s principle and the fact that the shortest path 
between two points is a straight line segment having the points as endpoints 
that in a homogeneous and isotropic medium (having identical structure at 
each point and in each direction) light propagates in straight lines. 

Now consider two such media, and suppose that light propagates from 
point A; to Ag, as shown in Fig. 5.10. 


15 W. Snell (1580-1626) — Dutch astronomer and mathematician. 
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If cy and c3 are the velocities of light in these media, the time required to 
traverse the path is 


1 1 
t(x) = —4/h? +x? + —4/h2 + (a—2)?. 
C1 C2 


We now find the extremum of the function t(x): 


ee ee — o0 
yh? +r? C2 4/h2 + (a -— x)? l 


which in accordance with the notation of the figure, yields c['sina; = 
cz sin ae. 

It is clear from physical considerations, or directly from the form of the 
function t(x), which increases without bound as x — ov, that the point 
where t(x) = 0 is an absolute minimum of the continuous function t(x). 


. 


; ‘ ; e : e sın ~i — Ci 
Thus Fermat’s principle implies the law of refraction $p% A 


1 
t'(x) = — 
Ue 


Example 7. We shall show that for x > 0 


x*-—axr+a-1<0, when0O<a<l, (5.84) 
x*-—axr+a-1>0, whena<Oorl<a. (5.85) 


Proof. Differentiating the function f(x) = £z% —ax+a-—1, we find f'(x) = 
a(x! — 1) and f'(x) = 0 when x = 1. In passing through the point 1 the 
derivative passes from positive to negative values if 0 < œ < 1 and from 
negative to positive values if a < 0 or a > 1. In the first case the point 1 is 
a strict maximum, and in the second case a strict minimum (and, as follows 
from the monotonicity of f on the intervals 0 < x < 1 and 1 < x, not merely 
a local minimum). But f(1) = 0 and hence both inequalities (5.84) and (5.85) 
are established. In doing so, we have even shown that both inequalities are 
strict ifa~1. O 


We remark that if x is replaced by 1+ x, we find that (5.84) and (5.85) 
are extensions of Bernoulli’s inequality (Sect. 2.2; see also Problem 2 below), 
which we already know for a natural-number exponent a. 
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By elementary algebraic transformations one can obtain a number of clas- 
sical inequalities of great importance for analysis from the inequalities just 
proved. We shall now derive these inequalities. 


a. Young’s inequalities.!° Ifa > 0 and b > 0, and the numbers p and q 
such that p £ 0,1, q # 0,1 and 7 +25 1, then 


1 1 

al/Pyl/4 < a + =b, ifp>1, (5.86) 
Pp q 
1 1 

a!/Pb!/9 > a + a ; if p <1 ; (5.87) 


and equality holds in (5.86) and (5.87) only when a = b. 


Proof. It suffices to set r = § and a = = in (5.84) and (5.85), and then 


introduce the notation i =] 


b. Hölder’s inequalities.!” Let x; > 0, y; > 0, i= 1,...,n, and A =A; 
Then 


- = 1/p ae 1/q 
X ziyi < on) (Sov) forp>1, (5.88) 
and . a 
1/p 1/q 
ee > (Soa?) (Xu) forp <1, p40. (5.89) 
i=l i=1 i=1 
In the case p < 0 it is assumed in (5.89) that x; > 0 (i = 1,...,n). Equal- 
ity is possible in (5.88) and (5.89) only when the vectors (x{,...,22) and 
(y?,..., y2) are proportional. 


Proof. Let us verify the inequality (5.88). Let X = D> a? > 0 and Y = 


i=1 


X y; > 0. Setting a = a and b = A in (5.86), we obtain 


a. 
Il 
m 


LiYi lo. ly 
X'/pyl/a~pX qY ` 

Summing these inequalities over 7 from 1 to n, we obtain 

n 

PS Liyi 

i=1 

X1/py 1/4 Sh) 

which is equivalent to relation (5.88). 


16 W., H. Young (1882-1946) — British mathematician. 
17 O. Holder (1859-1937) — German mathematician. 
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We obtain (5.89) similarly from (5.87). Since equality occurs in (5.86) and 
(5.87) only when a = b, we conclude that it is possible in (5.88) and (5.89) 
only when a proportionality x? = Ay} or y} = Ax? holds. O 


c. Minkowski’s inequalities.'® Let x; > 0, y; > 0,7=1,...,n. Then 


£ 1/p n 1/p R 1/p 
Do + yi)? < (> 2?) F (X) whenp>1, (5.90) 
i=1 i=1 i=1 


and 


n 


1/p £ 1/p 
z?) +(3 ve) when p<1,p#0. (5.91) 
i=1 


Proof. We apply Holder’s inequality to the terms on the right-hand side of 
the identity 


n 


NO (zi +y)? => zilei t yi) PT +> yili H yi)? . 
i=1 i=1 i=l | 

The left-hand side is then bounded from above (for p > 1) or below (for 
p < 1) in accordance with inequalities (5.88) and (5.89) by the quantity 


(Da) (Serr) "+ (Seat) Eer)". 


i=1 i=1 i= i=l 


n 1/q 
After dividing these inequalities by > (xi + yz)? ) , we arrive at (5.90) 
i=1 


and (5.91). 

Knowing the conditions for equality in Holder’s inequalities, we verify 
that equality is possible in Minkowski’s inequalities only when the vectors 
(11,..-,%n) and (y1,.--,Yn) are collinear. O 


For n = 3 and p = 2, Minkowski’s inequality (5.90) is obviously the 
triangle inequality in three-dimensional Euclidean space. 


Example 8. Let us consider another elementary example of the use of higher- 
order derivatives to find local extrema. Let f(x) = sin x. Since f'(x) = cosx 
and f”(x) = — sin z, all the points where f'(x) = cosx = 0 are local extrema 
of sin z, since f”(x) = — sin x Æ 0 at these points. Here f”(x) < Oifsinz >0 
and f”(x) > 0 if singz < 0. Thus the points where cosx = 0 and sing > 0 
are local maxima and those where cos x = 0 and sing < 0 are local minima 
for sin x (which, of course, was already well-known). 


18 H, Minkowski (1864-1909) — German mathematician who proposed a mathe- 
matical model adapted to the special theory of relativity (a space with a sign- 
indefinite metric). 
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5.4.3 Conditions for a Function to be Convex 


Definition 1. A function f :]a,b|— R defined on an open interval ]a, b[c R 
is convex if the inequalities 


flaix + Q222) < ay f (x1) + a2 f (x2) (5.92) 


hold for any points x1, £2 €]a,b[ and any numbers a, > 0, a2 > 0 such that 
Q, + a2 = 1. If this inequality is strict whenever x, 4 £2 and a a2 Æ 0, the 
function is strictly convex on |a, DI. 


Geometrically, condition (5.92) for convexity of a function f :]a,b/> R 
means that the points of any arc of the graph of the function lie below the 
chord subtended by the arc (see Fig. 5.11). 


(Za, f (z2)) 
(arxı + a2x2, &1 f (£1) + a2f(zr2)) 


(x1, f(z1)) 


| 
| | 
| 
| 
Tı L=a1%1 + a2% T2 


Fig. 5.11. 


In fact, the left-hand side of (5.92) contains the value f(x) of the function 
at the point x = a 1X1 + aeX2 € [£1, £2] and the right-hand side contains 
the value at the same point of the linear function whose (straight-line) graph 
passes through the points (x1, f(xı1)) and (ae, f(x2)). 

Relation (5.92) means that the set E = {(x,y) € R?| x €]a, b| , f(x) < y} 
of the points of the plane lying above the graph of the function is convex; 
hence the term “convex”, as applied to the function itself. 


Definition 2. If the opposite inequality holds for a function f :]a,b|—> R, 
that function is said to be concave on the interval Ja, b|, or, more often, conver 
upward in the interval, as opposed to a convex function, which is then said 
to be conver downward on |a, bf. 


Since all our subsequent constructions are carried out in the same way 
for a function that is convex downward or convex upward, we shall limit 
ourselves to functions that are convex downward. 

We first give a new form to the inequality (5.92), better adapted for our 
purposes. 
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In the relations 7 = a121 + Q2%2, &1 + a2 = 1, we have 


L2 — T £ — Tı 
Qi = 5 Q2 = , 
LQ — Lj T2 — Tı 


so that (5.92) can be rewritten as 


T2 


f (x2) . 


f(x) < a f(ai) + 


zt — Tı 
T2 — = 


T Tı 


Taking account of the inequalities xı < x < xq and zı < z2, we multiply by 
Lo — £1, and obtain 


(x2 — x) f (£1) + (£1 — 2) f(x) + (x — x1) f (z2) 2 0. 


Remarking that rq — zı = (£2 — x) + (x — xı) we obtain from the last 
inequality, after elementary transformations, 


fle) - f(a) © F(e2)- f@) sel 


Y= Ži T2 — T 


for xı < £ < T2 and any z1, 22 E]a, b|. 

Inequality (5.93) is another way of writing the definition of convexity of 
the function f(x) on an open interval Ja, b|. Geometrically, (5.93) means (see 
Fig. 5.11) that the slope of the chord I joining (x1, f(x1)) to (x, f(x)) is not 
larger than (and in the case of strict convexity is less than) the slope of the 
chord II joining (x, f(x)) to (x2, f(x2)). 

Now let us assume that the function f :]a, b[— R is differentiable on Ja, bf. 
Then, letting x in (5.93) tend first to xı, then to z2, we obtain 


f (22) — f(x 
Pla) < LEAFED < fia), 
LQ — T1 
which establishes that the derivative of f is monotonic. 
Taking this fact into account, for a strictly convex function we find, using 
Lagrange’s theorem, that 


f(a) = f@) _ Fe) - f(a) 


£ — Tı Tə — T 


f'(ai) < F (€) = f'(€2) < f'(z2) 
for xı <&, < x£ < 2 < 2g, that is, strict convexity implies that the derivative 
is strictly monotonic. 

Thus, if a differentiable function f is convex on an open interval Ja, bf, 
then f’ is nondecreasing on Ja, b[; and in the case when f is strictly convex, 
its derivative f’ is increasing on Ja, b|. 

These conditions turn out to be not only necessary, but also sufficient for 
convexity of a differentiable function. 
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In fact, fora < xı < £ < £2 < b, by Lagrange’s theorem 


* f(x) — SE) =f ei f(x2)— fe) 


£ — Ti = 


Sf (62) 


where zı < é < x < 9 < £2; and if f’(&,) < f’(&), then condition (5.93) 
for convexity holds (with strict convexity if f’(&1) < f’(&2)). 
We have thus proved the following proposition. 


Proposition 5. A necessary and sufficient condition for a function f : 
Ja, b|— R that is differentiable on the open interval |a, b[ to be conver (down- 
ward) on that interval is that its derivative f’ be nondecreasing on ja, bi. A 
strictly increasing f’ corresponds to a strictly convex function. 


Comparing Proposition 5 with Proposition 3, we obtain the following 
corollary. 


Corollary. A necessary and sufficient condition for a function f :|a,b[> R 
having a second derivative on the open interval |a, b| to be convex (downward) 
on Ja, b[ is that f(x) > 0 on that interval. The condition f”(x) > 0 on ja, b[ 
is sufficient to guarantee that f is strictly convex. 

We are now in a position to explain, for example, why the graphs of 
the simplest elementary functions are drawn with one form of convexity or 
another. 


Example 9. Let us study the convexity of f(x) = x® on the set x > 0. Since 
f" (x) = a(a—1)x°~?, we have f(x) > 0 for a < 0 or a@ > 1, that is, for these 
values of the exponent a the power function x® is strictly convex (downward). 
For 0 < a < 1 we have f” (x) < 0, so that for these exponents it is strictly 
convex upward. For example, we always draw the parabola f(x) = x” as 
convex downward. The other cases œ = 0 and a = 1 are trivial: z? = 1 and 
x! = x. In both of these cases the graph of the function is a ray (see Fig. 5.18 
on p. 253). 


Example 10. Let f(x) = a”, 0 < a, a # 1. Since f(x) = a™ln*a > 0, the 
exponential function a” is strictly convex (downward) on R for any allowable 
value of the base a (see Fig. 5.12). 


Example 11. For the function f(x) = log, x we have f” (x) = — -= —, so that 
the function is strictly convex (downward) if 0 < a < 1, and strictly convex 
upward if 1 < a (see Fig. 5.13). 


Example 12. Let us study the convexity of f(x) = sin x (see Fig. 5.14). 

Since f”(x) = —sinz, we have f”(x) < 0 on the intervals 7-2k < x < 
m(2k + 1) and f”(x) > 0 on r(2k—1) < x < v - 2k, where k € Z. It 
follows from this, for example, that the arc of the graph of sin x on the closed 
interval 0 < x < 4 lies ae the chord it subtends everywhere except at the 
endpoints; therefor sin x > 2r fr0<r< 3 
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We now point out another characteristic of a convex function, geometri- 
cally equivalent to the statement that a convex region of the plane lies entirely 
on one side of a tangent line to its boundary. 


Proposition 6. A function f :]a,b|— R that is differentiable on the open 
interval |a, b[ is conver (downward) on ja, b| if and only if its graph contains 
no points below any tangent drawn to it. In that case, a necessary and suffi- 
cient condition for strict convexity is that all points of the graph except the 
point of tangency lie strictly above the tangent line. 


Proof. Necessity. Let ro €ja,b|. The equation of the tangent line to the 
graph at (zo, f(xo)) has the form 


y = f (xo) + f'(zo)(z — zo) , 


so that 


f(x) — y(x) = f(x) — f (£0) — f'(£o)(x — zo) = (f'(€) — F'(z0)) (£ — 20) , 


where € is a point between x and Zo. Since f is convex, the function f'(x) is 
nondecreasing on ja, b| and so the sign of the difference f (€) — f’(zo) is the 
same as the sign of the difference x — xo. Therefore f(x) — y(x) > 0 at each- 
point x €ja,b{. If f is strictly convex, then f’ is strictly increasing on Ja, b[ 
and so f(x) — y(x) > 0 for x Ea, b[ and x Æ zo. 


5.4 Differential Calculus Used to Study Functions 247 


Sufficiency. If the inequality 
f(z) — y(x) = F(x) — f (zo) - f' (2o) — z0) 2 0 (5.94) 


holds for any points x, £o €]a, b[, then 


f(z) — f (z0) 
T — To 
f(x) — f (z0) 


T — To 


< f'(xo) for £ < 20, 
> f'(xo) for ro< rz. 


Thus, for any triple of points 21, 2,22 E€ļa,b| such that zı < £ < xq we 


obtain 
f(x) — f(z1) < f (a2) — f(z) 


? 


and strict inequality in (5.94) implies strict inequality in this last relation, 
which, as we see, is the same as the definition (5.93) for convexity of a func- 
tion. O 


Let us now consider some examples. 


Example 18. The function f(x) = e” is strictly convex. The straight line 
y = x+1 is tangent to the graph of this function at (0,1), since f(0) = e? = 1 
and f’(0) = e*| o = 1. By Proposition 6 we conclude that for any x € R 


e >l+z, 
and this inequality is strict for x Æ 0. 


Example 14. Similarly, using the strict upward convexity of lnx, one can 
verify that the inequality 
Ing<2-1 


holds for x > 0, the inequality being strict for x Æ 1. 


In constructing the graphs of functions, it is useful to distinguish the 
points of inflection of a graph. 


Definition 3. Let f : U(xo) —> R be a function defined and differentiable on 
a neighborhood U (xo) of zo € R. If the function is convex downward (resp. 


upward) on the set U- (xo) = {x € U(x£o)|£ < £o} and convex upward (resp. 
downward) on Ut (29) = {x € U(z0)|z > To}, then (zo, f(x£o)) is called a 
point of inflection of the graph. 


Thus when we pass through a point of inflection, the direction of convexity 
of the graph changes. This means, in particular, that at the point (xo, f (xo)) 
the graph of the function passes from one side of the tangent line to the other. 


248 5 Differential Calculus 


An analytic criterion for the abscissa xo of a point of inflection is easy to 
surmise, if we compare Proposition 5 with Proposition 3. To be specific, one 
can say that if f is twice differentiable at xo, then since f'(x) has either a 
maximum or a minimum at xo, we must have f” (zo) = 0. 

Now if the second derivative f’’(x) is defined on U(x9) and has one sign 


O (0) 
everywhere on UT (zo) and the opposite sign everywhere on U® (xo), this 


is sufficient for f'(x) to be monotonic in U- (xo) and monotonic in ut (Xo) 
but with the opposite monotonicity. By Proposition 5, a change in the di- 
rection of convexity occurs at (xo, f (zo)), and so that point is a point of 
inflection. 


Example 15. When considering the function f(x) = sin x in Example 12 we 
found the regions of convexity and concavity for its graph. We shall now 
show that the points of the graph with abscissas x = 7k, k € Z, are points 
of inflection. 

Indeed, f”(x) = — sin x, so that f”(x) = 0 at x = rk, k € Z. Moreover, 
f” (x) changes sign as we pass through these points, which is a sufficient 
condition for a point of inflection (see Fig. 5.14 on p. 246). 


Example 16. It should not be thought that the passing of a curve from one 
side of its tangent line to the other at a point is a sufficient condition for the 
point to be a point of inflection. It may, after all, happen that the curve does 
not have any constant convexity on either a left- or a right-hand neighborhood 
of the point. An example is easy to construct, by improving Example 5, which 
was given for just this purpose. 

Let 

2z? +2%sin for x #0, 
f(z) = 
0 for rx =0. 

Then z3 < f(x) < 3r? for 0 < x and 323 < f(x) < x? for x < 0, so 
that the graph of this function is tangent to the x-axis at x = 0 and passes 
from the lower half-plane to the upper at that point. At the same time, the 
derivative of f(x) 


6x? + 3x? sin + — 2 cos 4 frx #0, 


f(z) = 
0 for x = 0 


is not monotonic in any one-sided neighborhood of x = 0. 


In conclusion, we return again to the definition (5.92) of a convex function 
and prove the following proposition. 


Proposition 7. (Jensen’s inequality).’9 If f :Ja,b|— R is a convex function, 
L1,---,Ln are points of ja, bl, and ay,...,Qn are nonnegative numbers such 


19 J. L. Jensen (1859-1925) — Danish mathematician. 
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that ay +--- +a, = 1, then 
— f(ayty +--+ + Antn) < anf (zı) +: + anf (tn) . (5.95) 


Proof. For n = 2, condition (5.95) is the same as the definition (5.92) of a 
convex function. 

We shall now show that if (5.95) is valid for n = m —1, it is also valid for 
n=m. 

For the sake of definiteness, assume that a, Æ 0 in the set aj,...,Qn. 
Then p = a2+---+Q, > 0 and $ +: +% = 1. Using the convexity of 
the function, we find 


fai, +--+ + Qnty) = Fan + B( Gas ++ Saen) ) < 


aif (x1) + BF (Fas a eS Fan) ; 


since a; + 8 = 1 and (Ya, +--+ Sz) Ela, df. 
By the induction hypothesis, we now have 


IA 


jot a) < Fs (02) +--+ FF len). 


Consequently 


AN 


flaizı +--+ anTn) < anf (x1) +p ++ Gin) < 


ay f (x1) + 2f (£2) +--+» + anf (Tn). 


lA 


By induction we now conclude that (5.95) holds for any n € N. (For n = 1, 
relation (5.95) is trivial.) O 


We remark that, as the proof shows, a strict Jensen’s inequality corre- 
sponds to strict convexity, that is, if the numbers aj,...,@, are nonzero, 
then equality holds in (5.95) if and only if 7] =--- = Zp. 

For a function that is convex upward, of course, the opposite relation to 
inequality (5.95) is obtained: 


f(ayay +--+ antn) > anf (21) +--+ + Anf (Ln) - (5.96) 


Example 17. The function f(x) = Inz is strictly convex upward on the set 
of positive numbers, and so by (5.96) 


ay ln zı +--- + an ln Tn < In(a, xz, +--- + AnZn) 


or, 
Epi BN" < ATI +++ + Ann (5.97) 


n 
for x; > 0, a; >0,i=1,...,n, and }> a; = 1. 


a=) 
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1 
n? 


ty+-+++2n 
n 


In particular, if aj = --- = æn = =, we obtain the classical inequality 


YL 66+ Ln L (5.98) 
between the geometric and arithmetic means of n nonnegative numbers. 
Equality holds in (5.98), as noted above, only when z1 = £2 = --- = Tn- 
If we set n = 2, a] = = Q2 = =, £1 = a, £2 = b in (5.97), we again obtain 
the known equality (5.86). 


q? 


Example 18. Let f(x) = x?, x > 0, p > 1. Since such a function is convex, 
we have 


n —1 n 
Setting q = —4, a; = 09 ( D 02) , and x; = a;b; “TP Y bf here, we 


|? 
á i= 1=1 


1 
obtain Holder’s inequality (5.88): 


n 


= 1/p / 1/q 
dows (Da) (X8) 


where 5+ ¿ =1 and p> 1. 
For p < 1 the function f(x) = x? is convex upward, and so analogous 
reasoning can be carried out in Hölder’s other inequality (5.89). 


5.4.4 L’H6pital’s Rule 


We now pause to discuss a special, but very useful device for finding the limit 
of a ratio of functions, known as |’H6pital’s rule.?° 


Proposition 8. (lHôpital’s rule). Suppose the functions f :Ja,b|—> R and 
g :|a, b[—> R are differentiable on the open interval |a, b| (-co < a < b < +00) 
with g'(x) #0 on ja, b| and 


/ 
f (x) —>AÁasr—a+0 (—œ < A< +0). 
g'(x) 
Then 
TDD 2 Hes EE, 
g(x) 


20 G. F. de Hôpital (1661-1704) — French mathematician, a capable student of Jo- 
hann Bernoulli, a marquis for whom the latter wrote the first textbook of analysis 
in the years 1691-1692. The portion of this textbook devoted to differential cal- 
culus was published in slightly altered form by |’Hopital under his own name. 
Thus “l’H6pital’s rule” is really due to Johann Bernoulli. 
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in each of the following two cases: 
1° (f(x) — 0) A (g(x) + 0) asx >a+0, 
or 
2° g(x) 3 œ asx—>a+t0. 
A similar assertion holds as x + b — 0. 


L’Hopital’s rule can be stated succinctly, but not quite accurately, as 
follows. The limit of a ratio of functions equals the limit of the ratio of their 
derivatives if the latter exists. 


Proof. If g'(x) 4 0, we conclude on the basis of Rolle’s theorem that g(x) is 
strictly monotonic on Ja, b[. Hence, shrinking the interval |a, b[ if necessary 
by shifting toward the endpoint a, we can assume that g(x) Æ 0 on Ja, b[. By 
Cauchy’s theorem, for x,y €]a,b[ there exists a point € €]a, b| such that 


f(z) - f(y) _ FE) 
g(z)- gly) 9 (€) ` 


Let us rewrite this equality in a form convenient for us at this point: 


f(z) _ fy) , FO, _ 9) 
aa) > ala) * PO) gla) 


As x — a + 0, we shall make y tend to a + 0 in such a way that 


f(y) (y) 
g(x) (x) 


This is obviously possible under each of the two hypotheses 1° and 2° that 
we are considering. Since € lies between x and y, we also have £ > a+ 0. 
Hence the right-hand side of the last inequaNty (and therefore the left-hand 
side also) tends to A. O 


= 0 and 7 30. 


= lim = = 1 


x—0 


This example should not be looked on as a new, independent proof of 
the relation “#22 — 1 as z — 0. The fact is that in deriving the relation 
sin’ x = cosx we already made use of the limit just calculated. 

We always verify the legitimacy of applying l’Hôpital’s rule after we find 
the limit of the ratio of the derivatives. In doing so, one must not forget to 
verify condition 1° or 2°. The importance of these conditions can be seen in 
the following example. 


Example 20. Let f(x) = cosx, g(x) = sinx. Then f'(x) = —sing, g'(x) = 
COs T, and £3 ile E — 0 as z > +0. 
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Example 21. 
] 1 1 
lim =Z = lim G = lim ——=0O0fora>0O. 

L—>-+00 g? t—>+oo œr! xr—+00 QTL 
Example 22. 

Q a—1 a-—n 

= alae 1 

zr—>+oo Qt t—+oo af lna zr—> +00 a” (ln a)” 


-n 


a SO e COO. 


for a > 1, since for n > a and a > 1 it is obvious that 


We remark that this entire chain of equalities was hypothetical until we 
arrived at an expression whose limit we could find. 


5.4.5 Constructing the Graph of a Function 


A graphical representation is often used to gain a visualizable description of 
a function. As a rule, such a representation is useful in discussing qualitative 
questions about the behavior of the function being studied. 

For precise computations graphs are used more rarely. In this connection 
what is important is not so much a scrupulous reproduction of the function in 
the form of a graph as the construction of a sketch of the graph of the function 
that correctly reflects the main elements of its behavior. In this subsection 
we shall study some general devices that are encountered in constructing a 
sketch of the graph of a function. 


a. Graphs of the Elementary Functions We recall first of all what the 
graphs of the main elementary functions look like. A complete mastery of 
these is needed for what follows (Figs. 5.12-5.18). 


= arcsin x 
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Fig. 5.17. 
Fig. 5.18. 


b. Examples of Sketches of Graphs of Functions (Without Appli- 
cation of the Differential Calculus) Let us now consider some examples 
in which a sketch of the graph of a function can be easily constructed if we 
know the graphs and properties of the simplest elementary functions. 


Example 23. Let us construct a sketch of the graph of the function 
h = log,.2-37—92 2 Ps 
Taking account of the relation 


1 1 
fers Ae ie logo(x? — 3z +2) log.(x — 1)(x — 2) ’ 
we construct successively the graph of the quadratic trinomial y,; = x*—32+42, 
then yo = logs yı (x), and then y = AE (Fig. 5.19). 
The shape of this graph could have been “guessed” in a different way: 
by first determining the domain of definition of the function log,2 3,492 = 


(log, (x? — 3a + D then finding the behavior of the function under ap- 
proach to the boundary points of the domain of definition and on intervals 
whose endpoints are the boundary points of the domain of definition, and 
finally drawing a “smooth curve” taking account of the behavior thus deter- 
mined at the ends of the interval. 


Example 24. The construction of a sketch of the graph of the function 
y = sin(z”) 


can be seen in Fig. 5.20. 

We have constructed this graph using certain characteristic points for this 
function, the points where sin(x?) = —1, sin(x?) = 0, or sin(x?) = 1. Between 
two adjacent points of this type the function is monotonic. The form of the 
graph near the point x = 0, y = 0 is determined by the fact that sin(x?) ~ x? 
as x — 0. Moreover, it is useful to note that this function is even. 
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Since we will be speaking only of sketches rather than a precise con- 
struction of the graph of a function, let us agree for the sake of brevity to 
understand that “constructing the graph of a function” is equivalent to “con- 
structing a sketch of the graph of the function”. 


Example 25. Let us construct the graph of the function 
y = x + arctan(z® — 1) 


(Fig. 5.21). As x + —oo the graph is well approximated by the line y = x- 3, 
while for x => +00 it is approximated by y= x + 3. 


We now introduce a useful concept. 


Definition 4. The line co + cz is called an asymptote of the graph of the 
function y = f(x) as x  —co (or x —> +00) if f(x) — (co + cix) = o(1) as 
x —> —o (or x — +00). 

Thus in the present example the graph has the asymptote y = x — 4 as 
x —> —œ and y = x + 5 asx => +00. 

If |f (x)| > co as x — a — 0 (or as x — a + 0) it is clear that the graph of 
the function will move ever closer to the vertical line x = a as x approaches 
a. We call this line a vertical asymptote of the graph, in contrast to the 
asymptotes introduced in Definition 4, which are always oblique. 

Thus, the graph in Example 23 (see Fig. 5.19) has two vertical asymptotes 
and one horizontal asymptote (the same asymptote as  — —oo and as 
x — +00). 
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Fig. 5.21. 


It obviously follows from Definition 4 that 


cq = lm —, 


Co = lim (f(x) — c11) . 


Cn = lim F(z) 
r—>— oo 
Cn-1 = im MoE , 
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These relations, written out here for the case x — —oo, are of course 
also valid in the case x —> +00 and can be used to describe the asymptotic 
behavior of the graph of a function f(x) using the graph of the corresponding 
algebraic polynomial cı + c£ + -+ Cna”. 


Example 26. Let (p,p) be polar coordinates in the plane and suppose a point 
is moving in the plane in such a way that 


p= p(t) =1- ecos 5t; 
O=]O0)=1 -e™sin 5t 


at time t (t > 0). Draw the trajectory of the point. 

In order to do this, we first draw the graphs of p(t) and y(t) (Figs. 5.22a 
and 5.22b). 

Then, looking simultaneously at both of the graphs just constructed, we 
can describe the general form of the trajectory of the point (Fig. 5.22c). 


Fig. 5.22. 


c. The Use of Differential Calculus in Constructing the Graph of a 
Function As we have seen, the graphs of many functions can be drawn in 
their general features without going beyond the most elementary considera- 
tions. However, if we want to make the sketch more precise, we can use the 
machinery of differential calculus in cases where the derivative of the function 
being studied is not too complicated. We shall illustrate this using examples. 


Example 27. Construct the graph of the function y = f(x) when 


f(x) = |x + 2ļe71 . 
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Fig. 5.23. 


The function f(x) is defined for x € R \ 0. Since e~!/* + 1 as z — oœ, it 
follows that 
—(x + 2) as zt > -%0 , 
|x + 2le7 V7 ~ 
(x +2) asz — +o. 


Next, it is obvious that |2+2le—!/* — +00 as x + —0, and |r+2ļe71/® > 
0 as x — +0. Finally, it is clear that f(z) > 0 and f(—2) = 0. On the 
basis of these observations, one can already make a first draft of the graph 
(Fig. 5.23a). 

Let us now see for certain whether this function is monotonic on the 
intervals | — co, —2/, [—2,0|, and ]0,+00[, whether it really does have these 
asymptotics, and whether the convexity of the graph is correctly shown. 

Since 


2 
xe +xr+2,—1 . 
eee ce z ifr, 


f(z) = 


2 
eteteel/e if —-2<xandr#0, 


_ and f'(x) Æ 0, we can form the following table: 


Interval ]-—0o0,-2[ ]-—2,0[  ]0, +00] 
Sign of f'(x) — + Foo 
Behavior of f(x) +oœoN0 OA+c0 0 Z7 +œ 


On the regions of constant sign of the derivative, as we know, the function 
exhibits the corresponding monotonicity. In the bottom row of the table the 
symbol +00 N 0 denotes a monotonic decrease in the values of the function 
from +oo to 0, and 0 7 +00 denotes monotonic increase from 0 to +00. 

We observe that f'(x) — —4e71/? as z => —2 — 0 and f'(x) > 4e71/? as 
x —> —2+0, so that the point (—2,0) must be a cusp in the graph (a bend of 


258 5 Differential Calculus 


the same type as in the graph of the function |z|), and not a regular point, as 
depicted in Fig. 5.23a). Next, f'(x) — 0 as x > +0, so that the graph should 
emanate from the origin tangentially to the x-axis (remember the geometric 
meaning of f’(z)!). 

We now make the asymptotics of the function as x  —oo and x > +00 
more precise. 

Since e71/® = 1 — z7! + 0(x7) as x > ov, it follows that 


= —z—1+0(1) as t —oo, 
L+2ie lI" ~ 
x+1+o(1) asz > +œ, 


so that in fact the oblique asymptotes of the graph are y = —x—1 as x —> —œ0 
and y = x + l as x > +00. 

From these data we can already construct a quite reliable sketch of the 
graph, but we shall go further and find the regions of convexity of the graph 


by computing the second derivative: 
2 — 3x 
-2 eTe ; if x < —2 3 
x 


f"(x) = 


T 


, if -2<xrandz#0. 


Since f’’(x) = 0 only at x = 2/3, we have the following table: 


Interval ]-—0o0o,-2[ ]-2,0[ ]0, 2/3] ]2/3, +00] 
Sign of f” (x) — + + — 
Convexity of f(x) Upward Downward Downward Upward 


Since the function is differentiable at x = 2/3 and f” (x) changes sign as 
x passes through that point, the point (2/3, f(2/3)) is a point of inflection 
of the graph. | 

Incidentally, if the derivative f'(x) had had a zero, it would have been 
possible to judge using the table of values of f'(x) whether the corresponding 
point was an extremum. In this case, however, f'(x) has no zeros, even though 
the function has a local minimum at x = —2. It is continuous at that point 
and f'(x) changes from negative to positive as x passes through that point. 
Still, the fact that the function has a minimum at x = —2 can be seen just 
from the description of the variation of values of f(x) on the corresponding 
intervals, taking into account, of course, the relation f(—2) = 0. 

We can now draw a more precise sketch of the graph of this function 
(Fig. 5.23b). 


We conclude with one more example. 
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Example 28. Let (x,y) be Cartesian coordinates in the plane and suppose a 
moving point has coordinates 


_ t _t-2¢ 
er ee ee 


at time t (t > 0). Describe the trajectory of the point. 

We begin by sketching the graphs of each of the two coordinate functions 
x = x(t) and y = y(t) (Figs. 5.24a and 5.24b). 

The second of these graphs is somewhat more interesting than the first, 
and so we shall desribe how to construct it. 


Fig. 5.24. 
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We can see the behavior of the function y = y(t) as t > +0, t => 1 — 0, 
t + 1+ 0, and the asymptote y(t) = 2t + o(1) as t — +00 immediately from 
the form of the analytic expression for y(t). 

After computing the derivative 


1 — 5t? + 24 
t = —— 


we find its zeros: tı ~ 0.5 and ta = 1.5 in the region t > 0. 
Then, by compiling the table 


Interval .]0,tı[ lti, 1[ |1, t2[ ]t2, too] 
Sign of y(t) + = = + 
Behavior of y(t) 0 Ay(ti) y(ti) \—-00 +N y(t2) y(t2) 7 +00 


we find the regions of monotonicity and the local extreme values y(t1) % 
(a maximum) and y(t.) + 4 (a minimum). 

Now, by studying both graphs x = x(t) and y = y(t) simultaneously, we 
make a sketch of the trajectory of the point in the plane (Fig. 5.24c). 

This sketch can be made more precise. For example, one can determine 
the asymptotics of the trajectory. 

Since lim un = —1 and lim (y(t) + x(t)) = 2, the line y = —x + 2 is an 


1 
3 


asymptote for both ends of the trajectory, corresponding to t approaching 1. 
It is also clear that the line x = 0 is a vertical asymptote for the portion of 
the trajectory corresponding to t > +oo. 

We find next 

, w 1-5? +204 
a Tt E 1 + t2 l 
As one can easily see, the function 1-5ut2u decreases monotonically from 1 
to —1 as u increases from 0 to 1 and increases from —1 to +00 as u increases 
from 1 to +00. 

From the monotonic nature of y4, one can draw conclusions about the 
convexity of the trajectory on the corresponding regions. Taking account of 
what has just been said, one can construct the following, more precise sketch 
of the trajectory of the point (Fig. 5.24d). 

If we had considered the trajectory for t < 0 as well, the fact that x(t) 
and y(t) are odd functions would have added to the curves already drawn in 
the xy-plane the curves obtained from them by reflection in the origin. 


We now summarize some of these results as very general recommendations 
for the order in which to proceed when constructing the graph of a function 
given analytically. Here they are: 
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1° Give the domain of definition of the function. 

2° Note the specific properties of the function if they are obvious (for exam- 
ple, evenness or oddness, periodicity, identity to the graphs of well-known 
functions up to simple coordinate changes). 

3° Determine the asymptotic behavior of the function under approach to 
boundary points of the domain of definition and, in particular, find 
asymptotes if they exist. 

4° Find the intervals of monotonicity of the function and exhibit its local 
extreme values. 

5° Determine the convexity properties of the graph and indicate the points 
of inflection. 

6° Note any characteristic points of the graph, in particular points of inter- 
section with the coordinate axes, provided there are such and they are 
amenable to computation. 


5.4.6 Problems and Exercises 


l. Let x = (£1,..., £n) and a = (a1,...,@n), where x; > 0,a; > Ofori=1,...,n 


and $` a; = 1. For any number t # 0 we consider the mean of order t of the 
t=] 


numbers 21,...,2n with weights a: 
n 1/t 
Mi (xz, a) = (> aia! 
i=1 
In particular, when a1 = -+° = Qn = T we obtain the harmonic, arithmetic, and 
quadratic means for t = —1, 1,2 respectively. 


Show that a) lim M:(x,&œ) = r’ ---x£”, that is, in the limit one can obtain 
=> 
the geometric mean; 
b) lim Mi(z,a) = max zi; 
t—+0o l<i<n 
c) lim Mi(x2,a) = min zi; 
t—>—0o 1l<i<n 


d) M:(x,q@) is a nondecreasing function of t on R and is strictly increasing if 
n > 1 and the numbers 7; are all nonzero. 


2. Show that |1 + x|? > 1+ pz + cppp(x), where cp is a constant depending only 
on p, 

|x|? for |z| <1, 

|x|? for |z| >1, 


and (p(x) = |z|? on R if 2 < p. 


_ \3 
3. Verify that cosx < (222) for 0 < |z| < 5- 


zx 
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4, Study the function f(x) and construct its graph if 
a) f(x) = arctan log, cos (na + z); 


b) f(x) = arccos G — sin x); 


c) f(z) = Y2x(x + 3)?. 


d) Construct the curve defined in polar coordinates by the equation y = PETU 
p œ 0, and exhibit its asymptotics. 


e) Show how, knowing the graph of the function y = f(x), one can obtain the 
graph of the following functions f(x)+ B, Af(x), f(z+6), f(ax), and, in particular 


—f(z) and f(—2). 


5. Show that if f € C (]a, bf) and the inequality 


(537) < Fle) + Ses) 


holds for any points 21, £2 €]a, b|, then the function f is convex on Ja, bf. 


6. Show that 
a) if a convex function f : R — R is bounded, it is constant; 
b) if 


for a convex function f : R — R, then f is constant. 


c) for any convex function f defined on an open interval a < x < +00 (or 
—oco < x < a), the ratio f 2) tends to a finite limit or to infinity as x tends to 


infinity in the domain of definition of the function. 


7. Show that if f :]a,b/— R is a convex function, then 


a) at any point x €]a,b[ it has a left-hand derivative f and a right-hand 
derivative f4, defined as 


f(x +h) — f(z) 
0 h 
f(z +h) — f(z) 

h ? 


fi(z) = lim 


/ ae 
f(x) = 5 
and fL (x) < fi(z); 
b) the inequality f4.(21) < f (x2) holds for 71, £2 €]a, b[ and zı < 29; 
c) the set of cusps of the graph of f(z) (for which f- (x) 4 f4.(z)) is at most 
countable. 


8. The Legendre transform’! of a function f : I — R defined on an interval I C R 
is the function 


f*(t) = sup (te — f(@)) . 


1 A.M. Legendre (1752-1833) — famous French mathematician. 
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Show that 

a) The set I* of values of t € R for which f*(t) € R (that is, f*(t) Æ oo) is 
either empty or consists of a single point, or is an interval of the line, and in this 
last case the function f*(t) is convex on I”. 


b) If f is a convex function, then I* # Ø, and for f* € C(J*) 
(f*)" = sup (zt — f*(t)) = f(a) 
teI* 


for any x € I. Thus the Legendre transform of a convex function is involutive, (its 
square is the identity transform). 


c) The following inequality holds: 
at < f(x)+f*(t) for x € I and t € I". 


d) When f is a convex differentiable function, f*(t) = tx: — f (x+), where zs is 
determined from the equation t = f'(x). Use this relation to obtain a geometric 
interpretation of the Legendre transform f* and its argument t, showing that the 
Legendre transform is a function defined on the set of tangents to the graph of f. 


e) The Legendre transform of the function f(x) = +7* fora > 1 and z > 0 is 
the function f*(t) = at? , where t > 0 and + + 4 = 1. Taking account of c), use 
this fact to obtain Young’s inequality, which we already know: 

zt < = ge + 14 . 
a B 


f) The Legendre transform of the function f(x) = e” is the function f*(t) = 
tln £, t > 0, and the inequality 


t 
xt Se" +tin= 


holds for xz € R and t > 0. 


9. Curvature and the radius and center of curvature of a curve at a point. Suppose a 
point is moving in the plane according to a law given by a pair of twice-differentiable 
coordinate functions of time: x = x(t), y = y(t). In doing so, it describes a certain 
curve, which is said to be given in the parametric form x = z(t), y = y(t). A special 
case of such a definition is that of the graph of a function y = f(x), where one may 
take x = t, y = f(t). We wish to find a number that characterizes the curvature of 
the curve at a point, as the reciprocal of the radius of a circle serves as an indication 
of the amount of bending of the circle. We shall make use of this comparison. 


a) Find the tangential and normal components a; and a, respectively of the 


acceleration a = (a(t), öce) of the point, that is, write a as the sum a; +an, where 


a: is collinear with the velocity vector v(t) = (z0), u(t)), so that a; points along 
the tangent to the trajectory and a, is directed along the normal to the trajectory. 
b) Show that the relation 
Ol 
|an (t)| 


holds for motion along a circle of radius r. 
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c) For motion along any curve, taking account of b), it is natural to call the 
quantity 
MOJ 


Jan(¢)| 


the radius of curvature of the curve at the point (z6), y(t) ). 


r(t) = 


Show that the radius of curvature can be computed from the formula 


d) The reciprocal of the radius of curvature is called the absolute curvature of a 
plane curve at the point (a(t), u(t)). Along with the absolute curvature we consider 
the quantity oe 

_ ry ry 
called the curvature. 

Show that the sign of the curvature characterizes the direction of turning of the 
curve relative to its tangent. Determine the physical dimension of the curvature. 


e) Show that the curvature of the graph of a function y = f(x) at a point 
(z, f (z)) can be computed from the formula 


___y'(z) 
k(x) = [1 + (y’)2)3/2 ` 


Compare the signs of k(x) and y” (x) with the direction of convexity of the graph. 


f) Choose the constants a, b, and R so that the circle (x—a)?+(y—b)* = R? has 
the highest possible order of contact with the given parametrically defined curve 
x = x(t), y = y(t). It is assumed that x(t) and y(t) are twice differentiable and that 


(ż(to), w(to)) # (0,0). 

This circle is called the osculating circle of the curve at the point (zo, yo). Its 
center is called the center of curvature of the curve at the point (xo, yo). Verify that 
its radius equals the radius of curvature of the curve at that point, as defined in b). 


g) Under the influence of gravity a particle begins to slide without any prelim- 
inary impetus from the tip of an iceberg of parabolic cross-section. The equation 
of the cross-section is x + y? = 1, where x > 0, y > 0. Compute the trajectory of 
motion of the particle until it reaches the ground. 
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5.5 Complex Numbers and the Connections 
Among the Elementary Functions 


5.5.1 Complex Numbers 


Just as the equation x? = 2 has no solutions in the domain Q of rational 
numbers, the equation x? = —1 has no solutions in the domain R of real 
numbers. And, just as we adjoin the symbol V2 as a solution of x? = 2 and 
connect it with rational numbers to get new numbers of the form rı + V/2ra, 
where 71,72 € Q, we introduce the symbol i as a solution of x? = —1 and 
attach this number, which lies outside the real numbers, to real numbers and 
arithmetic operations in R. 

One remarkable feature of this enlargement of the field R of real numbers, 
among many others, is that in the resulting field C of complex numbers, every 
algebraic equation with real or complex coefficients now has a solution. 

Let us now carry out this program. 


a. Algebraic Extension of the Field R Thus, following Euler, we in- 
troduce a number i, the imaginary unit, such that i? = —1. The interaction 
between i and the real numbers is to consist of the following. One may mul- 
tiply i by numbers y € R, that is, numbers of the form iy necessarily arise, 
and one may add such numbers to real numbers, that is, numbers of the form 
x +iy occur, where x,y € R. 

If we wish to have the usual operations of a commutative addition and 
a commutative multiplication that is distributive with respect to addition 
defined on the set of objects of the form x + iy (which, following Gauss, we 
shall call the complex numbers), then we must make the following definitions: 


(£1 + iyi) + (z2 + iy2) := (z1 + z2) + i(y1 + y2) (5.99) 
and 
(xı + iyı) ‘ (£2 F iy2) = (1122 = Y1Y2) + i(£1Y2 F £2Y1) : (5.100) 


Two complex numbers x; + iy; and z2 + iy2 are considered equal if and 
only if Ly = T2 and Yı = Y2. 

We identify the real numbers x € R with the numbers of the form x +i-0, 
and i with the number 0 +i- 1. The role of 0 in the complex numbers, as can 
be seen from Eq. (5.99), is played by the number 0 +i- 0 = 0 € R; the role 
of 1, as can be seen from Eq. (5.100), is played by 1+i-0=1€ER. 

It follows from properties of the real numbers and definitions (5.99) and 
(5.100) that the set of complex numbers is a field containing R as a subfield. 

We shall denote the field of complex numbers by C and typical elements 
of it usually by z and w. 

The only nonobvious point in the verification that C is a field is the 
assertion that every non-zero complex number z = z + iy has an inverse z~! 
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with respect to multiplication (a reciprocal), that is z-z~* = 1. Let us verify 
this. 
We call the number x —iy the conjugate of z = x J iy, and we denote it , 
We observe that A ty") +i. Ae # 0 if z # 0. Thus z7 


should be taken as TA 


ie eg? = ia yr: 


b. Geometric Interpretation of the Field C We remark that once the 
algebraic operations (5.99) and (5.100) on complex numbers have been intro- 
duced, the symbol i, which led us to these definitions, is no longer needed. 
We can identify the complex number z = x + iy with the ordered pair (x,y) 
of real numbers, called respectively the real part and the imaginary part of 
the complex number z. (The notation for this is x = Rez, y = Imz.) 

But then, regarding the pair (x,y) as the Cartesian coordinates of a point 
of the plane R? = R x R, one can identify complex numbers with the points 
of this plane or with two-dimensional vectors having coordinates (x,y). 

In such a vector interpretation the coordinatewise addition (5.99) of com- 
plex numbers corresponds to vector addition. Moreover such an interpretation 
naturally leads to the idea of the absolute value or modulus |z| of a complex 
number as the absolute value or length of the vector (x,y) corresponding to 


it, that is 
jz} = Vart+y?, zsnt iy, (5.101) 


and also to a way of measuring the distance between complex numbers zı and 
zə as the distance between the points of the plane corresponding to them, 


that is, as 
|z4 = Z9| = y (xı = £2)? + (yı = y2)? 3 (5.102) 


The set of complex numbers, interpreted as the set of points of the plane, 
is called the complex plane and also denoted by C, just as the set of real 
numbers and the real line are both denoted by R. 

Since a point of the plane can also be defined in polar coordinates (r, y) 
connected with Cartesian coordinates by the relations 


t= Pepi 


. (5.103) 
y=rsing, 
the complex number 
z=axut+iy (5.104) 
can be represented in the form 
z = r(cos +isiny) . (5.105) 


The expressions (5.104) and (5.105) are called respectively the algebraic 
and trigonometric (polar) forms of the complex number. 

In the expression (5.105) the number r > 0 is called the modulus or 
absolute value of the complex number z (since, as one can see from (5.103), 
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r = |z|), and y the argument of z. The argument has meaning only for 
z #0. Since the functions cosy and siny are periodic, the argument of a 
complex number is determined only up to a multiple of 27, and the symbol 
Arg z denotes the set of angles of the form y + 27k, k € Z, where y is any 
angle satisfying (5.105). When it is desirable for every complex number to 
determine uniquely some angle y € Arg z, one must agree in advance on the 
range from which the argument is to be chosen. This range is usually either 
0<y < 27 or -r <y < 7. If such a choice has been made, we say that a 
branch (or the principal branch) of the argument has been chosen. The values 
of the argument within the chosen range are usually denoted arg z. 

The trigonometric form (5.105) for writing complex numbers is convenient 
in carrying out the operation of multiplication of complex numbers. In fact, 
if 


zı = rı(cos yı +isin y1), 


ro(cos y2 + isin 2), 


Z2 
then 


zı : Z2 = (rı cosy, + ir, sin Y1 )(r2 cos p2 + ire sin yo) = 
= (Trır2 COS Y1 COS Y2 — TıT2 Sin Yı SİN Y2) + 
+ i(rır2 sin Y1 Cos Y2 + rır2 cos p2 SiN 92) , 


or 
zı + 22 = Tır2( cos(p2 + p2) + isin(yi + p2)) - (5.106) 


Thus, when two complex numbers are multiplied, their moduli are multi- 
plied and their arguments are added. 

We remark that what we have actually shown is that if yı € Arg zı and 
p2 E Arg ze, then Y1 + ye € Arg (z1 - z2). But since the argument is defined 
only up to a multiple of 27, we can write that 


Arg (z1 - z2) = Arg z1 + Arg ze , (5.107) 


interpreting this equality as set equality, the set on the right-hand side being 
the set of all numbers of the form yı +2, where yı € Arg zı and y2 € Arg 22. 
Thus it is useful to interpret the sum of the arguments in the sense of the set 
equality (5.107). 

With this understanding of equality of arguments, one can assert, for 
example, that two complex numbers are equal if and only if their moduli and 
arguments are equal. 

The following formula of de Moivre?? follows by induction from formula 
(5.106): 


if z = r(cosy+isiny) , then z” = r”(cosny + isin ny) . (5.108) 


22 A.de Moivre (1667-1754) — British mathematician. 
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Taking account of the explanations given in connection with the argument 
of a complex number, one can use de Moivre’s formula to write out explicitly 
all the complex solutions of the equation z” = a. 

Indeed, if 

a = p(cos Y + isin Y) 


and, by formula (5.108) 
z” =r"(cosny +isinny) , 


we have r = 7v/pand ny = {+ 2zk, k € Z, from which we have yp = ey 2T k, 
Different complex numbers are obviously obtained only for k = 0,1,...,n—1. 
Thus we find n distinct roots of a: 


Y wy .. A Qe 
=. = Zk) (= Zk)) E E A 
Zk yp( cos (= + = +isin( = +- ( n—1) 
In particular, if a = 1, that is, p= 1 and % = 0, we have 


A= Vel = cos (22k) + isin (nh) (k=0,1,...,n—1). 


These points are located on the unit circle at the vertices of a regular 
n-gon. 

In connection with the geometric interpretation of the complex numbers 
themselves, it is useful to recall the geometric interpretation of the arithmetic 
operations on them. 

For a fixed b € C, the sum z + b can be interpreted as the mapping of C 
into itself given by the formula z +> z + b. This mapping is a translation of 
the plane by the vector b. 

For a fixed a = |a|(cosy +isin y) Æ 0, the product az can be interpreted 
as the mapping z +> az of C into itself, which is the composition of a dilation 
by a factor of |a| and a rotation through the angle y € Arga. This is clear 
from formula (5.106). 


5.5.2 Convergence in C and Series with Complex Terms 


The distance (5.102) between complex numbers enables us to define the €- 
neighborhood of a number zo € C as the set {z € C| |z — zo| < e}. This set is 
a disk (without the boundary circle) of radius £ centered at the point (xo, yo) 
if Zo =x; + iyo. 

We shall say that a sequence {zn } of complex numbers converges to zo € C 
if lim [Zn — Zo| = 0. 


It is clear from the inequalities 


max{|£n — zo|, |yn — yol} < |zn — zo| < |En — zo| + [yn — Yo] (5.109) 
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that a sequence of complex numbers converges if and only if the sequences of 
real and imaginary parts of the terms of the sequence both converge. 

By analogy with sequences of real numbers, a sequence of complex num- 
bers {zn } is called a fundamental or Cauchy sequence if for every € > 0 there 
exists an index N € N such that |z, — zm| < € for all n,m > N. 

It is clear from inequalities (5.109) that a sequence of complex numbers 
is a Cauchy sequence if and only if the sequences of real and imaginary parts 
of its terms are both Cauchy sequences. 

Taking the Cauchy convergence criterion for sequences of real numbers 
into account, we conclude on the basis of (5.109) that the following proposi- 
tion holds. 


Proposition 1. (The Cauchy criterion). A sequence of complex numbers 
converges if and only if it is a Cauchy sequence. 


If we interpret the sum of a series of complex numbers 
zı Fz2 +e + ey ors (5.110) 


as the limit of its partial sums s,, = z1 +: + Zn as n — œo, we also obtain 
the Cauchy criterion for convergence of the series (5.110). 


Proposition 2. The series (5.110) converges if and only if for every £ > 0 
there exists N € N such that 


l2m + +n] <€ (5.111) 
for any natural numbers n> m >N. 


From this one can see that a necessary condition for convergence of the 
series (5.110) is that zn — 0 as n — oo. (This, however, is also clear from 
the very definition of convergence.) 

As in the real case, the series (5.110) is absolutely convergent if the series 


lzal + Izal +: [nl + (5.112) 


converges. 
It follows from the Cauchy criterion and the inequality 


that if the series (5.110) converges absolutely, then it converges. 


Examples The series 
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1 2 1 4 
D tee 


converge absolutely for all z € C, since the series 
1 1 
/ 2 ; 
1") L+ Flalt+ SI ace tan T 
1 1 
2’) alah Pegler Re ; 
1 1 
all converge for any value of |z| € R. We remark that we have used the 
equality |z”| = |z|” here. 


Example 4. The series 1 + z + 2? +--+ converges absolutely for |z| < 1 and 


its sum is s = 7+. For |z| > 1 it does not converge, since in that case the 


general term does not tend to zero. 
Series of the form 
co tei(z — zo) +-+: ten(z— 20)" +" (5.113) 


are called power series. 
By applying the Cauchy criterion (Subsect. 3.1.4) to the series 


leo| + ler (2 — 20)| +--+ len(z— 20)"|+-°> , (5.114) 


we conclude that this series converges if 
eae tas —1 
|z — zo| < ( lim Veal), 


and that the general term does not tend to zero if |z — zo| > ( lim Y |cnl) ga 
TU? OO 

From this we obtain the following proposition. 

Proposition 3. (The Cauchy-Hadamard formula).? The power series 


(5.113) converges inside the disk |z — zo| < R with center at zo and radius 
given by the Cauchy—Hadamard formula 


l 
ae Vien 


At any point exterior to this disk the power series diverges. 
At any point interior to the disk, the power series converges absolutely. 


(5.115) 


Remark. In regard to convergence on the boundary circle |z — zo| = R Propo- 
sition 3 is silent, since all the logically admissible possibilities really can occur. 


23 J, Hadamard (1865-1963) — well-known French mathematician. 
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Examples The series 
D ear 
n=1 
— L 
6 2k 
yn 
and 


converge in the unit disk |z| < 1, but the series 5) diverges at every point 


z where |z| = 1. The series 6) diverges for z = 1 and (as one can show) 

converges for z = —1. The series 7) converges absolutely for |z| = 1, since 
l yn 1 

[z2r = ga. 


One must keep in mind the possible degenerate case when R = O in 
(5.115), which was not taken account of in Proposition 3. In this case, of 
course, the entire disk of convergence degenerates to the single point zo of 
convergence of the series (5.113). 

The following result is an obvious corollary of Proposition 3. 


Corollary (Abel’s first theorem on power series). If the power series (5.113) 
converges at some value z*, then it converges, and indeed even absolutely, for 
any value of z satisfying the inequality |z — zo| < |z* — zol. 

The propositions obtained up to this point can be regarded as simple 
extensions of facts already known to us. We shall now prove two general 
propositions about series that we have not proved up to now in any form, 
although we have partly discussed some of the questions they address. 


Proposition 4. If a series zı + z2 +°---+2%n,+--- of complex numbers con- 
: verges absolutely, then a series Zn, + Zn. +++: + Zn, +: obtained by rear- 
ranging* its terms also converges absolutely a has the same sum. 


CO 
Proof. Using the convergence of the series > |z,,|, given a number € > 0, we 
n= 


choose N € N such that $` |z,|<e. 
n=N +1 
We then find an index K € N such that all the terms in the sum Sy = 


zı +++- + zy are among the terms of the sum 5k = 2n, +:::+2n, fork > K. 


24 The term with index k in this series is the term zn, with index nx in the original 
series. Here the mapping N3 k > np € N is assumed to be a bijective mapping 
on the set N. l 
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If s= >> zn, we find that fork > K 


n=l 
(0 9) (0 @) 
ls — šk| < |s- snl + lsn- Srl < XO lealt XO len| <2. 
n=N+1 n=N+1 


Thus we have shown that 5, — s as k — oo. If we apply what has just been 
proved to the series |z1|+]|z2|+---+|zn|+--- and |zn,|+|zZn.|+---+]en,|+:°5 
we find that the latter series converges. Thus Proposition 4 is now completely 
proved. O 


Our next proposition will involve the product of two series 
(ai tag +---+an+---)- (bi +b2+---+b, +) , 


The problem is that if we remove the parentheses and form all possible pair- 
wise products a;b;, there is no natural order for summing these products, 
since we have two indices of summation. The set of pairs (i, j), where i,j € N, 
is countable, as we know. Therefore we could write down a series having the 
products a,b; as terms in some order. The sum of such a series might depend 
on the order in which these terms are taken. But, as we have just seen, in 
absolutely convergent series the sum is independent of any rearrangement of 
the terms. Thus, it is desirable to determine when the series with terms a,b; 
converges absolutely. 


Proposition 5. The product of absolutely convergent series is an absolutely 
convergent series whose sum equals the product of the sums of the factor 
series. 


Proof. We begin by remarking that whatever finite sum } > a,b; of terms of 
the form a;b; we take, we can always find N such that the product of the 
sums Ay = a,+-:-:+ay and By = bi +--:-+ yn contains all the terms in 
that sum. Therefore 


N N N fore) oO 
Sat < S Jaib; < ` |asb;| = N Iai -X [b;| < sla: Dw i 
i=1 j=1 i=1 j=1 


i,j=1 


OO 
from which it follows that the series } | a,b; converges absolutely and that 
1,j=1 
its sum is uniquely determined independently of the order of the factors. In 
that case the sum can be obtained, for example, as the limit of the products 


of the sums A, = a1 +: + an and Bn = bi +--+ bn. But A,B, > AB 
as n —> œ, where A= ` a, and B= Y` bn, which completes the proof of 
n=1 


n=l 


Proposition 5. O 
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The following example is very important. 


Example 8. The series 3 La” and Sa 1 -70™ converge absolutely. In the 
n=0 m= o” 
product of these series let us group together all monomials of the form a 


having the same total degree n + m = k. We then obtain the series 


2 ` erter) 


k=0 ‘n+m=k 


nym 


But : 
l nım __ l k! ngpk-n __ 1 k 
De amt S 2 aa a 
m+n=k n=0 
and therefore we find that 


yaa" >> =" = (a+b). (5.116) 
k=0 


n=0 m=0 


5.5.3 Euler’s Formula and the Connections 
Among the Elementary Functions 


In Examples 1-3 we established the absolute convergence in C of the se- 
ries obtained by extending into the complex domain the Taylor series of the 
functions e”, sing, and cosg, which are defined on R. For that reason, the 
following definitions are natural ones to make for the functions e*, cos z, and 
sin z in C: 


1 1 1 
Z "E BY e a 5 
e” = expz := 1 + 2+ 57 + 317 fee, (5.117) 
- l 2,14 
cos z := 1 — g PaT (5.118) 
mz = 2- ietis 5.119 
pees eae t gg? N a (5. ) 


Following Euler,?”” let us make the substitution z = iy in Eq. (5.117). By 
suitably grouping the terms of the partial sums of the resulting series, we 
find that 


eae (ig) + pl aes = (iy)* + gu +- 


DS ec, ok 1 l, 1, 
25 L, Euler (1707-1783) — eminent mathematician and specialist in theoretical me- 
chanics, of Swiss extraction, who lived the majority of his life in St. Petersburg. 


In the words of Laplace, “Euler is the common teacher of all mathematicians of 
the second half of the eighteenth century.” 
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20 


This is the famous Euler formula. 
In deriving it we used the fact that i? = —1, i = —i, it = 1, i? =i, and 
so forth. The number y in formula (5.120) may be either a real number or an 


arbitrary complex number. 
It follows from the definitions (5.118) and (5.119) that 


that is, 


cos(—z) = cosz, 


sin(—z) = —sinz, 
that is, cos z is an even function and sin z is an odd function. Thus 
e'¥ = cosy —isiny. 
Comparing this last equality with formula (5.120), we obtain 


cosy = (el +e iv) 


pe KO] = 


siny = ord —e'¥). 


Since y is any complex number, it would be better to rewrite these equal- 


ities using notation that leaves no doubt of this fact: 


cos z = : (ef? -+ e~z) j 


(5.121) 


| 
— 
— 
N 
>) 
N 
N 
— 


sin Z 


Thus, if we assume that exp z is defined by relation (5.117), then formulas 
(5.121), which are equivalent to the expansions (5.118) and (5.119), like the 
formulas 


(e7 +e77), 


N| = 


coshy = 
(5.122) 


sinhz = +(e” — e77) . 


can be taken as the definitions of the corresponding circular and hyperbolic 
functions. Disregarding all the considerations about trigonometric functions 
that led us to this step, which have not been rigorously justified (even though 
they did lead us to Euler’s formula), we can now perform a typical mathe- 
matical trick and take formulas (5.121) and (5.122) as definitions and obtain 
from them in a completely formal manner all the properties of the circular 
and trigonometric functions. 
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For example, the fundamental identities 


cos? z + sin? z = 1, 
l, 


cosh? z — sinh? z 


like the parity properties, can be verified immediately. 

The deeper properties, such as, for example, the formula for the cosine 
and sine of a sum follow from the characteristic property of the exponential 
function: 

exp(z1 + 22) = exp(z1) - exp(z2) , (5.123) 


which obviously follows from the definition (5.117) and formula (5.116). Let 
us derive the formulas for the cosine and sine of a sum: 
On the one hand, by Euler’s formula 


el(ti+22) — cos(z1 + z2) + isin(z, + 22) . (5.124) 


On the other hand, by the property of the exponential function and Euler’s 
formula 


el(z1+22) — elei? — (cos zı +isin 21) (cos za + isin z2) = 


= (cos 21 Cos Z2 — sin 21 sin z2) + i(sin z1 cos z2 + cos 22 sin z2) . (5.125) 


If z1 and z2 were real numbers, then, equating the real and imaginary 
parts of the numbers in formulas (5.124) and (5.125), we would now have 
obtained the required formulas. Since we are trying to prove them for any 
z1,Z2 € C, we use the fact that cos z is even and sinz is odd to obtain yet 
another equality: 


eilzit+z2) — (cos 21 COS z2 — sin 2; sin z2) — i(sin z1 cos z2 + cos z1 sin z2) . 
(5.126) 
Comparing (5.125) and (5.126), we find 


1,. -ii ; ; 
cos(z1 + 22) = 5 (el(@1 +22) +e er) = COS Z1 COS Z2 — Sin 24 SİN 2 , 
1, ; Li i ; 
sin(z1 + 22) = 5 (ene) —e eee) = sin Z1 COS Z2 + cos 21 SİN 22 . 


The corresponding formulas for the hyperbolic functions cosh z and sinh z 
could be obtained in a completely analogous manner. Incidentally, as can be 
seen from formulas (5.121) and (5.122), these functions are connected with 
cos z and sin z by the relations 


cosh z = cosiz 


sinh z = —isiniz. 


However, to obtain even such geometrically obvious facts as the equality 
sina = 0 or cos(z + 27) = cosz from the definitions (5.121) and (5.122) is 
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very difficult. Hence, while striving for precision, one must not forget the 
problems where these functions naturally arise. For that reason, we shall 
not attempt at this point to overcome the potential difficulties connected 
with the definitions (5.121) and (5.122) when describing the properties of the 
trigonometric functions. We shall return to these functions after presenting 
the theory of integration. Our purpose at present was only to demonstrate the 
remarkable unity of seemingly completely different functions, which would 
have been impossible to detect without going into the domain of complex 
numbers. 
If we take as known that for x € R 


cos(x + 2r) = cosx, sin(x+27)=sinz, 


cos0=1, sin0=0, 


then from Euler’s formula (5.120) we obtain the relation 


2 


in which all the most important constants of the different areas of mathemat- 
ics are represented: 1 (arithmetic), 7 (geometry), e (analysis), and i (algebra). 
From (5.123) and (5.127), as well as from (5.120), one can see that 


exp(z + i27) = expz, 


that is, the exponential function is a periodic function on C with the purely 
imaginary period T = i27. 

Taking account of Euler’s formula, we can now represent the trigonometric 
notation (5.105) for a complex number in the form 


Z= Ter 


where r is the modulus of z and ọ its argument. 
The formula of de Moivre now becomes very simple: 


Zee (5.128) 


5.5.4 Power Series Representation of a Function. Analyticity 


A function w = f(z) of a complex variable z with complex values w, defined 
on a set E&E C C, is a mapping f : E — C. The graph of such a function 
is a subset of C x C = R? x R? = Rt, and therefore is not visualizable in 
the traditional way. To compensate for this loss to some extent, one usually 
keeps two copies of the complex plane C, indicating points of the domain of 
definition in one and points of the range of values in the other. 

In the examples below the domain E and its image under the correspond- 
ing mapping are indicated. 
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Example 9. 


Example 10. 


x 
z=œez+i= w 


Fig. 5.26. 


Example 11. iy 


T 


Fig. 5.27. 


These correspondences follow from the equalities i = e7/?, z = re?, and iz = 
rele+/2) that is, a rotation through angle 5 has occurred. 


Example 12. iy @) 


T 


ze2=w 


Fig. 5.28. 


For, if z = re'?, then 2° = r7e’?”. 
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Example 18. 


Example 14. 


z= z =w 


Fig. 5.30. 


It is clear from Examples 12 and 13 that under this function the unit disk maps 
into itself, but is covered twice. 


Example 15. 


Fig. 5.31. 


If z = re? , then by (5.128), we have z” = r”e”?, so that in this case the image 
of the disk of radius r is the disk of radius r”, each point of which is the image 
of n points in the original disk (located, as it happens, at the vertices of a regular 
n-gon). | 

The only exception is the point w = 0, whose pre-image is the point z = 0. However, 
as z — 0, the function z” is an infinitesimal of order n, and so we say that at z = 0 
the function has a zero of order n. Taking account of this kind of multiplicity, one 
can now say that the number of pre-images of every point w under the mapping 
ze z” = w is n. In particular, the equation z” = 0 has the n coincident roots 
Zy + = Zn = 0. 
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In accordance with the general definition of continuity, a function f(z) of 
a complex variable is called continuous at a point z € C if for any neigh- 
borhood V(f(zo)) of its value f(z) there exists a neighborhood U (zo) such 
that f(z) € V(f(zo)) for all z € U(zo). In short, 


Jim f(z) = f(zo) - 


The derivative of a function f(z) at a point zo, as for the real-valued case, 
is defined as 


f'(zo) = lim flz) = Flz0) (5.129) 


Z—>ZO zZz — Z0 
if this limit exists. 
The equality (5.129) is equivalent to 
f(z) — F(z0) = f'(z0)(2 — 20) + o(z — 2) (5.130) 


as Z — Zo, corresponding to the definition of differentiability of a function at 
the point Zp. 

Since the definition of differentiability in the complex-valued case is the 
same as the corresponding definition for real-valued functions and the arith- 
metic properties of the fields C and R are the same, one may say that all the 
general rules for differentiation hold also in the complex-valued case. 


Example 16. 


(f +9) (z) = fle) +g (2), 
(f-9)'(z) = f(z)g(z) + F(2)g' (2) , 
(9° f)'(z) = 9'(F(2)) - fF’), 


so that if f(z) = z?, then poe = lege eel = 2z, or if f(z) = z”, then 
f'(z) =nz""1, and if 


P, (2) = Co + c1(z — zo) + + +n (z _ zo)” 


then 
P! (z) = c + 2c2(2 — zo) +++ + nen(z — 2%)". 


(0) 
Theorem 1. The sum f(z) = >> cn(z — zo)” of a power series is an in- 


n= 
finitely differentiable function inside the entire disk in which convergence oc- 
curs. Moreover, 


(00) 


k 
= > S (eal -20)"), k=0,1,..., 


n=0 
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and i 
= Sf (20) 5 =O ly oes 


Proof. The expressions for the coefficient follows in an obvious way from the 
expressions for f‘*)(z) for k =n and z = zp. 

As for the formula for f‘*)(z), it suffices to verify this formula for k = 1, 
since the function f’(z) will then be the sum of a power series. 

CO 
Thus, let us verify that the function y(z) = >> ncn(z — 29)"? is indeed 
n=1 
the derivative of f(z). 

We begin by remarking that by the Cauchy-Hadamard formula (5.115) 
the radius of convergence of the derived series is the same as the radius of 
convergence R of the original power series for f(z). 

For simplicity of notation from now on we shall assume that z) = 0, that 


is, f(z) = > Car" Ole) = > NCnz"—+ and that these series converge for 
= n=1 
|z| < R. 


Since a power series converges absolutely on the interior of its disk of 
convergence, we note (and this is crucial) that the estimate [nenz” = 


n|en]|z|?7} < n|en|r”} holds for |z| < r < R, and that series 2 nlen|r?—* 
=1 
converges. Hence, for any € > 0 there exists an index N such that 


CO (© 9) e 
) nenz < ) nepr” Tt < 3 


for |z| <r. 

Thus at any point of the disk |z| < r the function p(z) is within 3 of the 
Nth partial sum of the series that defines it. 

Now let C and z be arbitrary points of this disk. The transformation 


fO-f@ _ A, ma 
Caz 


= ac + OP Fz pee GT 4 201) 


and the estimate |c,(¢"~* + ---+2"~+)| < |en|nr”—* enable us to conclude, 
as above, that the difference quotient we are interested in is equal within 3 
to the partial sum of the series that defines it, provided |¢| < r and |z| <r. 
Hence, for |¢| < r and |z| < r we have 


ORHON N eee 
= FG) = #2) p(2)| < Zon = ~ 2 nenz EP 
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If we now fix z and let ¢ tend to z, passing to the limit in the finite sum, 
we see that the right-hand side of this last inequality will be less than € for 
C sufficiently close to z, and hence the left-hand side will be also. 

Thus, for any point z in the disk |z| < r < R, we have verified that 
f'(z) = (z). Since r is arbitrary, this relation holds for any point of the disk 
z|<R. o 


This theorem enables us to specify the class of functions whose Taylor 
series converge to them. 

A function is analytic at a point zọ € C if it can be represented in a 
neighborhood of the point in the following (“analytic”) form: 


f(z) =) enlz — 20)", 
n=0 


that is, as the sum of a power series in z — 2. 
It is not difficult to verify (see Problem 7 below) that the sum of a power 
series is analytic at any interior point of the disk of convergence of the series. 
Taking account of the definition of analyticity, we deduce the following 
corollary from the definition of analyticity. 


Corollary. a) If a function is analytic at a point, then it is infinitely differ- 
entiable at that point, and its Taylor series converges to it in a neighborhood 
of the point. 

b) The Taylor series of a function defined in a neighborhood of a point 
and infinitely differentiable at that point converges to the function in some 
neighborhood of the point if and only if the function is analytic. 


In the theory of functions of a complex variable one can prove a remarkable 
fact that has no analogue in the theory of functions of a real variable. It 
turns out that if a function f(z) is differentiable in a neighborhood of a point 
zo E€ C, then it is analytic at that point. This is certainly an amazing fact, 
since it then follows from the theorem just proved that if a function f(z) has 
one derivative f’(z) in a neighborhood of a point, it also has derivatives of 
all orders in that neighborhood. 

At first sight this result is just as surprising as the fact that by adjoining 
to R a root i of the one particular equation z? = —1 we obtain a field C in 
which every algebraic polynomial P(z) has a root. We intend to make use 
of the fact that an algebraic equation P(z) = 0 has a solution in C, and for 
that reason we shall prove it as a good illustration of the elementary concepts 
of complex numbers and functions of a complex variable introduced in this 
section. 
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5.5.5 Algebraic Closedness of the Field C of Complex Numbers 


If we prove that every polynomial P(z) = co + cız +--+ cnz”, n> 1, with 
complex coefficients has a root in C, then there will be no need to enlarge 
the field C because some algebraic equation is not solvable in C. In this sense 
the assertion that every polynomial P(z) has a root establishes that the field 
C is algebraically closed. 

To obtain a clear idea of the reason why every polynomial has a root in 
C while there can fail to be a root in R, we use the geometric interpretation 
of complex numbers and functions of a complex variable. 

We remark that 


c oe 
E E Men), 
so that P(z) = cnz” + o(z”) as |z| — oo. Since we are interested in finding a 
root of the equation P(z) = 0, dividing both sides of the equation by cn, we 


may assume that the leading coefficient cn of P(z) equals 1, and hence 
P(z) = z” + 0(z”) as |z| > œ. (5.131) 


If we recall (Example 15) that the circle of radius r maps to the circle 
of radius r” with center at 0 under the mapping z +> z”, we see that for 
sufficiently large values of r the image of the circle |z| = r under the mapping 
w = P(z) will be, with small relative error, the circle |w| = r” in the w-plane 
(Fig. 5.32). What is important is that, in any case, it will be a curve that 
encloses the point w = 0. 


orrn N 


Fig. 5.32. 


If the disk |z| < r is regarded as a film stretched over the circle |z| = r, 
this film is mapped into a film stretched over the image of that disk under 
the mapping w = P(z). But, since the latter encloses the point w = 0, some 
point of that film must coincide with w = 0, and hence there is a point 2 in 
the disk |z| < r that maps to w = 0 under the mapping w = P(z), that is, 
P (zo) = 
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This intuitive reasoning leads to a number of important and useful con- 
cepts of topology (the index of a path with respect to a point, and the degree 
of a mapping), by means of which it can be made into a complete proof that is 
valid not only for polynomials, as one can see. However, these considerations 
would unfortunately distract us from the main subject we are now studying. 
For that reason, we shall give another proof that is more in the mainstream 
of the ideas we have already mastered. 


Theorem 2. Every polynomial 
P(z) = co + ciz +: + n2” 
of degree n > 1 with complex coefficients has a root in C. 


Proof. Without loss of generality, we may obviously assume that c, = 1. 
Let u = inf |P(z)|. Since P(z) = 2” (1 + == +--+ + 2), we have 
zE 


P(2)| > lay e en 


and obviously |P(z)| > max{1,2u} for |z| > R if R is sufficiently large. 
Consequently, the points of a sequence {zx} at which 0 < |P(zk)|— u < = lie 
inside the disk |z| < R. 

We shall verify that there is a point z in C (in fact, in this disk) at 
which |P(zo)| = u. To do this, we remark that if zk = £k + iyx, then 
max{|zxx|,|\yx|} < |zk| < R and hence the sequences of real numbers {xx} 
and {yk} are bounded. Choosing first a convergent subsequence {£p} from 
{xk} and then a convergent subsequence {yz, } from {yx,}, we obtain a 
subsequence Zk, = Zk, + iye, of the sequence {z,} that has a limit 

lim Zk „ = lim zy, +i lim yx, = Zo + iyo = zo, and since |zp,_ | — [zol 
m—- oo m—?oo m—?oo 
as m —> oo, it follows that |zọ| < R. So as to avoid cumbersome notation, 
and not have to pass to subsequences, we shall assume that the sequence 
{zk} itself converges. It follows from the continuity of P(z) at z € C that 
lim P(z,) = P(zo). But then?® |P(zo)| = lim |P(zx)| = u. 
k— co k—0o 

We shall now assume that u > 0, and use this assumption to derive a 


contradiction. If P(z) Æ 0, consider the polynomial Q(z) = = Be By 


construction Q(0) = 1 and |Q(z)| = meta > 1. 
Since Q(0) = 1, the polynomial Q(z) has the form 


Q(z) =1+ Oz” + sie +e + qnz”, 


26 Observe that on the one hand we have shown that from every sequence of complex 
numbers whose moduli are bounded one can extract a convergent subsequence, 
while on the other hand we have given another possible proof of the theorem 
that a continuous function on a closed interval has a minimum, as was done here 
for the disk |z| < R. 
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where |q] 4 0 and 1 < k < n. If qx = pe, then for y = noe we shall have 
qk ` (ei) = pèt élt) = peT = —p = —|qy|. Then, for z = re!” we obtain 


|Q(re'”)| < |1 + qkz"| + (larr "| +... + |gnz”]) Z 
= |1—r*|ax|| +r (lakil +--+ + lanlr”*) = 
Sls r" (larl — r|qk+ı|— + — r?—*/q,,|) ai. 


if r is sufficiently close to 0. But |Q(z)| > 1 for z € C. This contradiciton 
shows that P(z)=0. o 


Remark 1. The first proof of the theorem that every algebraic equation with 
complex coefficients has a solution in C (which is traditionally known as 
the fundamental theorem of algebra) was given by Gauss, who in general 
breathed real life into the so-called “imaginary” numbers by finding a variety 
of profound applications for them. 


Remark 2. A polynomial with real coefficients P(z) = ap +---+@nz", as we 
know, does not always have real roots. However, compared with an arbitrary 
polynomial having complex coefficients, it does have the unusual property 
that if P(z) = 0, then P(Z)) = 0 also. Indeed, it follows from the defi- 
nition of the complex conjugate and the rules for adding complex numbers 
that (zı + 20) = % + Z2. It follows from the trigonometric form of writing a 
complex number and the rules for multiplying complex numbers that 


(z1 < 2g) = (rE? - rgel¥2) = ryrgei(vity2) = 


— rirse eit 2) — rye ‘7 K roe `P? = Zi . 29 . 
Thus, 


P(z0) = ao +++ + anz9 = Gp + +++ +Gn2% = Qo +--+ +anzo = PZ) , 


and if P(20) = 0, then P(zo) = P(%o) =. 


Corollary 1. Every polynomial P(z) = co + -+ cnz” of degree n > 1 with 
complex coefficients admits a representation in the form 


P(z) = cn(z — 21) +--+ (Z2 — 2n) , (5.132) 


where 21,...,2n E C (and the numbers z1,..., Zn are not necessarily all dis- 
tinct). This representation is unique up to the order of the factors. 


Proof. From the long division algorithm for dividing one polynomial P(z) by 
another polynomial Q(z) of lower degree, we find that P(z) = q(z)Q(z)+r(z), 
where q(z) and r(z) are polynomials, the degree of r(z) being less than the 
degree m of Q(z). Thus if m = 1, then r(z) = r is simply a constant. 

Let zı be a root of the polynomial P(z). Then P(z) = q(z)(z—21)+r, and 
since P(z,) = r, it follows that r = 0. Hence if zı is a root of P(z), we have 
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the representation P(z) = (z — 21)q(z). The degree of the polynomial q(z) is 
n — 1, and we can repeat the reasoning with q(z) if n — 1 > 1. By induction 
we find that P(z) = e(z — z1) -++ (z — zn). Since we must have cz” = cnz”, it 
follows that c = cn. O 


Corollary 2. Every polynomial P(z) = ao +--+ anz” with real coefficients 
can be expanded as a product of linear and quadratic polynomials with real 
coefficients. 


Proof. This follows from Corollary 1 and Remark 2, by virtue of which for 
any root zķ of P(z) the number Zę is also a root. Then, carrying out the 
multiplication (z—z,)(z—Z,) in the product (5.132), we obtain the quadratic 
polynomial z? — (zk +Zk)z + |zķ|? with real coefficients. The number cn, which 
equals an, is a real number in this case and can be moved inside one of the 
sets of parentheses without changing the degree of that factor. O 


By multiplying out all the identical factors in (5.132), we can rewrite that 
product: 
P(z) = ¢n(z — 21)" -< (z — zp)*? . (5.133) 
The number k; is called the multiplicity of the root z;. 
Since P(z) = (z — z;)*iQ(z), where Q(z;) Æ 0, it follows that 


P'(z) = kj(z — 25)" Q(z) + (2 — 2) 9 Q' (2) = (z — 44)" TRE) , 
where R(z;) = k;Q(z;) 40. We thus arrive at the following conclusion. 


Corollary 3. Every root z; of multiplicity kj > 1 of a polynomial P(z) is a 
root of multiplicity kj; — 1 of the derivative P’(z). 


Not yet being in a position to find the roots of the polynomial P(z), we can 
use this last proposition and the representation (5.133) to find a polynomial 
p(z) = (z — z1) -++ (z — Zp) whose roots are the same as those of P(z) but are 
of multiplicity 1. 

Indeed, by the Euclidean algorithm, we first find the greatest common 
divisor q(z) of P(z) and P’(z). By Corollary 3, the expansion (5.133), and 
Theorem 2, the polynomial q(z) is equal, apart from a constant factor, to 
(z—z,)*1-1...(z—z,)*P—'. Hence by dividing P(z) by q(z) we obtain, apart 
from a constant factor that can be removed by dividing out the coefficient of 
zP, a polynomial p(z) = (z — 21) ++- (z — Zp) . 

Now consider the ratio R(x) = ae 
const. If the degree of P(x) is larger than that of Q(x), we apply the division 
algorithm and represent P(x) as p(x)Q(x) + r(x), where p(x) and r(x) are 
polynomials, the degree of r(x) being less than that of Q(x). Thus we obtain 
a representation of the form R(x) = p(x) + ay where the fraction ae 
now a proper fraction in the sense that the degree of r(x) is less than that of 
Q(z). 

The corollary we are about to state involves the representation of a proper 
fraction as a sum of fractions called partial fractions. 


of two polynomials, where Q(z) 4 


iS 
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Corollary 4. a) If Q(z) = (z — 2)*"1---(z — zp)" and 0 is @ proper 


fraction, there exists a unique representation of the fraction al ae in the form 


a0) 7 3 (Se C i) (5.134) 


j=1 
b) If P(x) and Q(x) are polynomials with real coefficients and 


Q(z) = (x = x)" a (x = a) (a + pix + qi)” oe (£? +Pnt + qn)” ) 


there exists a unique representation of the proper fraction ie in the form 


kj 


k=1 j=1 


NO, 


where ajk, bjk, and Cjk are real numbers. 


We remark that there is a universal method of finding the expansions 
(5.134) and (5.135) known as the method of undetermined coefficients, al- 
though this method is not always the shortest way. It consists of putting all 
the terms on the right-hand side of (5.134) or (5.135) over a common de- 
nominator, then equating the coefficients of the resulting numerator to the 
corresponding coefficients of P(x). The system of linear equations that results 
always has a unique solution because of Corollary 4. 

Since we shall as a rule be interested in the expansion of a specific fraction, 
which we shall obtain by the method of undetermined coefficients, we require 
nothing more from Corollary 4 than the assurance that it is always possible 
to do so. For that reason, we shall not bother to go through the proof. It is 
usually couched in algebraic language in a course of modern algebra and in 
analytic language in a course in the theory of functions of a complex variable. 

Let us consider a specially chosen example to illustrate what has just been 
explained. 


Example 17. Let 
P(x) = 22° + 32° + 6x4 + 62° + 102? + 3242, 
Q(x) = £’ + 32° + 5x5 + 7x4 + 72° + 52743241. 
Find the partial-fraction expansion (5.135) of the fraction OE 
First of all, the problem is complicated by the fact that we do not know 


the factors of the polynomial Q(x). Let us try to simplify the situation by 
eliminating any multiple roots there may be of Q(x). We find 


Q'(x) = Tx + 180° + 2524 + 282? + 21r? +10r+3.— 
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By a rather fatiguing, but feasible computation using the Euclidean algo- 
rithm, we find the greatest common divisor 


d(x) = zf + 2x? + 227 + 2x + 1 


of Q(x) and Q' (x). We have written the greatest common divisor with leading 
coefficient 1. 
Dividing Q(x) by d(x), we obtain the polynomial 


q(x) =r +r? +241, 


which has the same roots as Q(x), but each with multiplicity 1. The root —1 
is easily guessed. After q(x) is divided by z + 1, we find a quotient of x? + 1. 
Thus 

q(x) = (£ +1)(z* +1), 


and then by successively dividing d(x) by z? + 1 and x + 1, we find the 
factorization of d(x); 
d(x) = (a+1)*(#? +1), 


and then the factorization 

Q(x) = (x +1)?(z? +1). 
Thus, by Corollary 4b, we are seeking an expansion of the fraction ae in 
the form 


P(x) _ a a12 X a13 bix +c) biz + C12 
Q(x) x+1 (x+1? (x+1) z2? +1 (x2 +1)? © 


Putting the right-hand side over a common denominator and equating the 
coefficients of the resulting numerator to those of P(x), we arrive at a system 
of seven equations in seven unknowns, solving which, we finally obtain 


Pix) 1 2 en ttl 
Q(z) x+1 (x+1) (w@+1)3 x2? +1 (2? 41)? © 


5.5.6 Problems and Exercises 


1. Using the geometric interpretation of complex numbers 
a) explain the inequalities |z1+z2| < |z1|+|z2] and |zi+---+2n| < |z1|/+---+]zn|; 
b) exhibit the locus of points in the plane C satisfying the relation |z — 1| + |z + 
1| < 3; 
c) describe all the nth roots of unity and find their sum; 


d) explain the action of the transformation of the plane C defined by the formula 
ZZ: 
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2. Find the following sums: 
a) l+q+---+4q"; 
b) l+q+-:-+q" +- for |q| <1; 
c) 1 tel? +... 4 ein, 
d)1ltrel? +.--4r7elr?, 
e)l tre? +- +r”? +... for |r] < 1; 
f)1+rcosy+---+r”cosny; 
g)1l+rcosp+---+r”cosny+-:-- for |r| < 1; 
h) l+rsing+---+r”sinng; 
i) l+rsing+---+r”sinng+--- for |r| <1. 


3. Find the modulus and argument of the complex number lim (1 + 2)" and 
n—> CO 


verify that this number is e”. 


4. a) Show that the equation e” = z in w has the solution w = In |z| + iArg z. It is 
natural to regard w as the natural logarithm of z. Thus w = Ln z is not a functional 
relation, since Arg z is multi-valued. 


b) Find Ln 1 and Lni. 

c) Set z% = e°!” 7, Find 1” and ï. 

d) Using the representation w = sin z = J (e — e**), obtain an expression 
for z = arcsin w. 

e) Are there points in C where |sin z| = 2? 
5. a) Investigate whether the function f(z) = I is continuous at all points of 
the plane C. 


b) Expand the function ne in a power series around zp = 0 and find its radius 
of convergence. 


c) Solve parts a) and b) for the function IIZ: where à € R is a parameter. 


Can you make a conjecture as to how the radius of convergence is determined by 
the relative location of certain points in the plane C? Could this relation have been 
understood on the basis of the real line alone, that is, by expanding the function 
ire? where à € R and x € R? 


6. a) Investigate whether the Cauchy function 


ei ee Os 
f(z) = 
Oe ~2=0 
is continuous at z = Q. 


b) Is the restriction f| of the function f in a) to the real line continuous? 
R 


c) Does the Taylor series of the function f in a) exist at the point zo = 0? 
d) Are there functions analytic at a point zo € C whose Taylor series converge © 
only at the point zo? 


oo 
e) Invent a power series ` cn(z— zo)” that converges only at the one point zo. 
n=0 
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7. a) Making the formal substitution z — a = (z — zo) + (zo — a) in the power 


Oo 
series ` An(z—a)” and gathering like terms, obtain a series X` Cn(z — zo)” and 


expressions for its coefficients in terms of Az and (zo — a), k =0,1,.... 


b) Verify that if the original series converges in the disk |z — a| < R and 
|zo —a| =r < R, then the series defining Cn, n = 0,1,..., converge absolutely and 


Oo 
the series X` Cn(z — zo)” converges for |z — zo| < R-r. 


c) Show that if f(z) = X` An(z— a)” in the disk |z — a| < R and |zo — a| < R, 
n=0 
then in the disk |z — zo| < R — |zo — a| the function f admits the representation 


f(z) = È On(e— 20)". 


8. Verify that 

a) as the point z € C traverses the circle |z| = r > 1 the point w = z+ z7} 
traverses an ellipse with center at zero and foci at +2; 

b) when a complex number is squared (more precisely, under the mapping w > 
w°), such an ellipse maps to an ellipse with a focus at 0, traversed twice. 

c) under squaring of complex numbers, any ellipse with center at zero maps to 
an ellipse with a focus at 0. 


5.6 Some Examples of the Application 
of Differential Calculus in Problems of Natural Science 


In this section we shall study some problems from natural science that are 
very different from one another in their statement, but which, as will be seen, 
have closely related mathematical models. That model is none other than a 
very simple differential equation for the function we are interested in. From 
the study of one such example — the two-body problem — we really began the 
construction of differential calculus. The study of the system of equations 
` we obtained for this problem was inaccessible at the time. Here we shall 
consider some problems that can be solved completely at our present level of 
knowledge. In addition to the pleasure of seeing mathematical machinery in 
action in a specific case, from the series of examples in this section we shall in 
particular acquire additional confidence in both the naturalness with which 
the exponential function exp x arises and in the usefulness of extending it to 
the complex domain. 


5.6.1 Motion of a Body of Variable Mass 


Consider a rocket moving in a straight line in outer space, far from gravitating 
bodies (Fig. 5.33). 
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Fig. 5.33. 


Let M(t) be the mass of the rocket (including fuel) at time t, V(t) its 
velocity at time t, and w the speed (relative to the rocket) with which fuel 
flows out of the nozzle of the rocket as it burns. 

We wish to establish the connection among these quantities. 

Under these assumptions, we can regard the rocket with fuel as a closed 
system whose momentum (quantity of motion) remains constant over time. 

At time t the momentum of the system is M(t)V(t). 

At time t + h the momentum of the rocket with the remaining fuel is 
M(t+h)V(t+h) and the momentum ATI of the mass of fuel ejected over 
that time |AM| = |M(t+h) — M(t)| = —(M(t+h) — M(t)) lies between the 
bounds 

(V(t) —w)|AM|< AI < (V(t +h) — w)|AM| 


that is, AJ = (V(t)—w)|AM|+a(h)|AM|, and it follows from the continuity 
of V(t) that a(h) > 0 as h —> 0. 
Equating the momenta of the system at times t and t + h, we have 


M(t)V(t) = M(E+h)V(E+h) + (V(t) —w)|AM| + a(h)|AM| , 
or, after substituting |AM| = — (M(t + h) — M(t)) and simplifying, 


M(t+h)(V(t+h) -— V(t)) = 
= —w(M(t + h) — M(t)) + a(h)(M(t+h)—M(t)). 
Dividing this last equation by h and passing to the limit as h — 0, we 


obtain 
M(t)V'(t) = —wM' (t). (5.136) 


This is the relation we were seeking between the functions we were inter- 
ested in, V(t), M(t), and their derivatives. 

We now must find the relation between the functions V(t) and M(t) 
themselves, using the relation between their derivatives. In general a problem 
of this type is more difficult than the problem of finding the relations between 
the derivatives knowing the relation between the functions. However, in the 
present case, this problem has a completely elementary solution. 

Indeed, after dividing Eq. (5.136) by M(t), we can rewrite it in the form 


V'(t) = (wn MY (t) . (5.137) 
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But if the derivatives of two functions are equal on an interval, then the 
functions themselves differ by at most a constant on that interval. 
Thus it follows from (5.137) that 


V(t) = -wn M(t) +c. (5.138) 


If it is known, for example, that V(0) = Vo, this initial condition deter- 
mines the constant c completely. Indeed, from (5.138) we find 


c=Vo+wlnM(0), 
and we then find the formula we were seeking?” 


MO 


V(t) =V +wln MO ` 


(5.139) 


It is useful to remark that if mpr is the mass of the body of the rocket 
and mp is the mass of the fuel and V is the terminal velocity achieved by 
the rocket when all the fuel is expended, substituting M (0) = ma + mp and 
M(t) = mr, we find 

MF 
V=% +wh (1+ 25). 
MR 

This last formula shows very clearly that the terminal velocity is affected 
not so much by the ratio mf /mp inside the logarithm as by the outflow speed 
w, which depends on the type of fuel used. It follows in particular from this 
formula that if Vo = 0, then in order to impart a velocity V to a rocket whose 
own mass is MR one must have the following initial supply of fuel: 


MF = mp (e¥/ — 1) ; 


5.6.2 The Barometric Formula 


_ This is the name given to the formula that exhibits the dependence of atmo- 
spheric pressure on elevation above sea level. 

Let p(h) be the pressure at elevation h. Since p(h) is the weight of the 
column of air above an area of 1cm? at elevation h, it follows that p(h + 
A) differs from p(h) by the weight of the portion of the gas lying in the 
parallelepiped whose base is the original area of 1cm? and the same area 
at elevation h + A. Let p(h) be the density of air at elevation h. Since p(h) 
depends continuously on h, one may assume that the mass of this portion of 


27 This formula is sometimes connected with the name of K. E. Tsiolkovskii (1857- 
1935), a Russian scientist and the founder of the theory of space flight. But it 
seems to have been first obtained by the Russian specialist in theoretical mechan- 
ics I. V. Meshcherskii (1859-1935) in an 1897 paper devoted to the dynamics of 
a point of variable mass. 
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air is calculated from the formula | 


P(E) g/cm? - 1cm? - Acm = p(£)Ag, 


where € is some height between h and h+ A. Hence the weight of that mass? 


is g- p(Ẹ)A. 
Thus, 


8 


p(h + A) — p(h) = —9p(6)4A . 
After dividing this equality by A and passing to the limit as A — 0, 
taking account of the relation £ — h, we obtain 


p'(h) = —gp(h) . (5.140) 


Thus the rate of variation in atmospheric pressure has turned out to be 
proportional to the density of the air at the corresponding elevation. 

To obtain an equation for the function p(h), we eliminate the function 
p(h) from (5.140). By Clapeyron’s law”? (the ideal gas law) the pressure p, 
molar volume V, and temperature T of the gas (on the Kelvin®® scale) are 


connected by the relation | 
pV | 

— =R, 5.141 
7 (5.141) 

where R is the so-called universal gas constant. If M is the mass of one mole 


of air and V its volume, then p = %, so that from (5.141) we find 


1 M R R 
a ery ee Sad Leary 
Setting A = ET, we thus have 
p=XT)p. (5.142) 


If we now assume that the temperature of the layer of air we are describing 
is constant, we finally obtain from (5.140) and (5.142) 


p'(h) = —Sp(h) . (6.143) 
This differential equation can be rewritten as 
p(h)_ g9 
p(h) À 


or 
MTOE E 
(Inp)'(h) = ( - $h) , 
from which we derive 
Inp(h) = —Sh+e, 


28 Within the region where the atmosphere is noticeable, g may be regarded as — 
constant. 

29 B, P. E. Clapeyron (1799-1864) — French physicist who studied thermodynamics. 

30 W, Thomson (Lord Kelvin) (1824-1907) — famous British physicist. 
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or 
p(h) = e£ š e7 (g/A)h . 


The factor e° can be determined from the known initial condition 
p(0) = po, from which it follows that e° = po. 
Thus, we have found the following dependence of pressure on elevation: 


pape Nh, (5.144) 


For air at room temperature (of order 300K = 27° C) the value of A is 
known: A œ% 7.7-10°(cm/s)?. It is also known that g ~ 10° cm/s?. Thus 
formula (5.144) acquires a completely finished form after these numerical 
values of g and à are substituted. In particular, one can see from (5.144) 
that the pressure decreases by a factor of e (~ 3) times at elevation h = à = 
7.7-10°cm = 7.7 km. It increases the same number of times if one descends 
in a mine shaft to a depth of the order of 7.7 km. 


5.6.3 Radioactive Decay, Chain Reactions, and Nuclear Reactors 


It is known that the nuclei of heavy elements are subject to sporadic (spon- 
taneous) decay. This phenomenon is the so-called natural radioactivity. 

The main statistical law of radioactivity (which is consequently valid for 
amounts and concentrations of a substance that are not too small) is that 
the number of decay events over a small interval of time h starting at time t 
is proportional to h and to the number N(t) of atoms of the substance that 
have not decayed up to time t, that is, 


N(t +h) — N(t) = AN (t)h , 


where A > 0 is a numerical coefficient that is characteristic of the chemical 
element. 
Thus the function N(t) satisfies the now familiar differential equation 


N'(t) = -AN(E) , (5.145) 
from which it follows that 
N(t) = Noe ; 


where No = N(0) is the initial number of atoms of the substance. 

The time T required for half of the initial number of atoms to decay is 
called the half-life of the substance. The quantity T can thus be found from 
the equation e~*7 = E, that is, T = n2 ~ ae For example, for polonium- 
210 (Po?!°) the half-life T is approximately 138 days, for radium-226 (Ra??ô), 
T ~ 1600 years, for uranium-235 (U?*°), T ~ 7.1-108 years, and for its isotope 


U238 T = 4.5 - 10° years. 
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A nuclear reaction is an interaction of nuclei or of a nucleus with elemen- 
tary particles resulting in the appearance of a nucleus of a new type. This 
may be nuclear fusion, in which the coalescence of the nuclei of lighter ele- 
ments leads to the formation of nuclei of a heavier element (for example, two 
nuclei of heavy hydrogen — deuterium — yield a helium nucleus along with 
a release of energy); or it may be the decay of a nucleus and the formation 
of one or more nuclei of lighter elements. In particular, such decay occurs in 
approximately half of the cases when a neutron collides with a U*° nucleus. 
The breakup of the uranium nucleus leads to the formation of 2 or 3 new neu- 
trons, which may then participate in further interactions with nuclei, causing 
them to split and thereby leading to further multiplication of the number of 
neutrons. A nuclear reaction of this type is called a chain reaction. 

We shall describe a theoretical mathematical model of a chain reaction in 
a radioactive element and obtain the law of variation in the number N(t) of 
neutrons as a function of time. 

We take the substance to have the shape of a sphere of radius r. If r 
is not too small, on the one hand new neutrons will be generated over the 
time interval h measured from some time t in a number proportional to h 
and N(t), while on the other hand some of the neutrons will be lost, having 
moved outside the sphere. 

If v is the velocity of a neutron, then the only ones that can leave the 
sphere in time h are those lying within vh of its boundary, and of those only 
the ones whose direction of motion is approximately along a radius. Assuming 
that those neutrons constitute a fixed proportion of the ones lying in this zone, 
and that neutrons are distributed approximately uniformly throughout the 
sphere, one can say that the number of neutrons lost over the time interval 
h is proportional to N(t) and the ratio of the volume of this boundary layer 
to the volume of the sphere. 

What has just been said leads to the equality 


N(t+h) — N(t) ¥ aN(t)h — EN(th (5.146) 


(since the volume of the boundary layer is approximately 4rr?vh, and the 
volume of the sphere is amr). Here the coefficients a and 8 depend only on 
the particular radioactive substance. 

After dividing by h and passing to the limit in (5.146) as h — 0, we obtain 


N'(t) = (a- PNG) | (5.147) 
from which 3 
N (t) = No exp l (a — 3p . 
It can be seen from this formula that when (a — £) > 0, the number of 


neutrons will increase exponentially with time. The nature of this increase, 
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independently of the initial condition No, is such that practically total decay 
of the substance occurs over a very short time interval, releasing a colossal 
amount of enerey — that is an explosion. 

If (a — £) < 0, the reaction ceases very quickly since more neutrons are 
being lost than are being generated. 

If the boundary condition between the two conditions just considered 
holds, that is, a — g = 0, an equilibrium occurs between the generation of 
neutrons and their exit from the reaction, as a result of which the number of 
neutrons remains approximately constant. 

The value of r at which a — £ = 0 is called the critical radius, and the 
mass of the substance in a sphere of that volume is called the critical mass 
of the substance. 

For U?35 the critical radius is approximately 8.5 cm, and the critical mass 
approximately 50 kg. 

In nuclear reactors, where steam is produced by a chain reaction in a 
radioactive substance there is an artificial source of neutrons, providing the 
fissionable matter with a certain number n of neutrons per unit time. Thus 
for an atomic reactor Eq. (5.147) is slightly altered: 


N'(t) = (a - PN NG) fi (5.148) 


This equation can be solved by the same pa as Eq. (5.147), since 


=e N@+n ater <n is the derivative of the function zz = sB In [(a — PN (t) + n| if 


a — £ # 0. Consequently the solution of Eq. (5.148) has the form 


Noe’e-8/nt — [1 —e(o-4/")#] ifa- E #0, 


N(t) = a BF 


No + nt ifa-F=0. 


It can be seen from this solution that if a — 2 > 0 (supercritical mass), 


an explosion occurs. If the mass is pre-critical, however, that is, a — £ < 0, 


_we shall very soon have 


N(t) x 7 
r 

Thus, if the mass of radioactive substance is maintained in a pre-critical 
state but close to critical, then independently of the power of the additional 
neutron source, that is, independently of n, one can obtain higher values 
of N(t) and consequently greater power from the reactor. Keeping the pro- 
cess in the pre-critical zone is a delicate matter and is achieved by a rather 

complicated automatic control system. 


5.6.4 Falling Bodies in the Atmosphere 


We are now interested in the velocity v(t) of a S falling to Earth under 
the influence of gravity. 
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If there were no air resistance, the relation 
Ut) =g, (5.149) 


would hold for fall from relatively low altitudes. This law follows from New- 
ton’s second law ma = F and the law of universal gravitation, by virtue of 
which for h < R (where R is the radius of the Earth) 


Mm Mm 


MOSG hyn F 


=gm. 

A body moving in the atmosphere experiences resistance depending on 
the velocity of the motion, as a result of which the velocity of free fall for a 
heavy body in the atmosphere does not increase without bound, but stabilizes 
at some level. For example, a sky diver reaches a steady velocity between 50 
and 60 meters per second in the lower layers of the atmosphere. 

For the range of velocities from 0 to 80 meters per second we shall regard 
the resisting force as proportional to the velocity. The constant of propor- 
tionality of course depends on the shape of the body, which in some cases 
one tries to make streamlined (a bomb) while in other cases the opposite goal 
is pursued (a parachute). Equating the forces acting on the body, we arrive 
at the following equation, which must be satisfied by a body falling in the 
atmosphere: 


mov(t) = mg- av. (5.150) 
Dividing this equation by m and denoting & by 8, we finally obtain 
v(t) = -bv +g. (5.148’) 


We have now arrived at an equation that differs from Eq. (5.148) only in 
notation. We remark that if we set —Gu(t) + g = f(t), then, since f’(t) = 
—ßv' (t), one can obtain from (5.148’) the equivalent equation 


f(t) = -pf Ct) , 


which is the same as Eq. (5.143) or (5.145) except for notation. Thus we have 
once again arrived at an equation whose solution is the exponential function 


f(t) = f(e . 
It follows from this that the solution of Eq. (5.148’) has the form 


1 1 
v(t) = =g + (vo -s)e™, 
and the solution of the basic equation (5.150) has the form 
v(t) = “94 (vo = = ge (o/m)t . (5.151) 
a Q 


where vp = v(0) is the initial vertical velocity of the body. 
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It can be seen from (5.151) that for a > 0 a body falling in the atmosphere 
reaches a steady state at which v(t) ~ “g. Thus, in contrast to fall in airless 
space, the velocity of descent in the atmosphere depends not only on the 
shape of the body, but also on its mass. As a — 0, the right-hand side of 
(5.151) tends to vo + gt, that is, to the solution of Eq. (5.149) obtained from 
(5.150) when a = 0. 

Using formula (5.151), one can get an idea of how quickly the limiting 
velocity of fall in the atmosphere is reached. 

For example, if a parachute is designed to that a person of average size 
will fall with a velocity of the order 10 meters per second when the parachute 
is open, then, if the parachute opens after a free fall during which a velocity of 
approximately 50 meters per second has been attained, the person will have 
a velocity of about 12 meters per second three seconds after the parachute 
opens. 

Indeed, from the data just given and relation (5.151) we find @g = 10, 
m= 1, vo = 50 m/s, so that relation (5.151) assumes the form 


v(t) = 10+ 40e™* . 


Since e? ~ 20, for t = 3, we obtain v = 12m/s. 


5.6.5 The Number e and the Function exp œx Revisited 


By examples we have verified (see also Problems 3 and 4 at the end of this 
section) that a number of natural phenomena can be described from the 
mathematical point of view by the same differential equation, namely 


f(z) =af(z), (5.152) 


whose solution f(x) is uniquely determined when the “initial condition” f(0) 
is specified. Then 
f(x) = f(Q)e** . 


We introduced the number e and the function e? = expz in a rather 
formal way earlier, assuring the reader that e really was an important number 
and exp x really was an important function. It is now clear that even if we had 
not introduced this function earlier, it would certainly have been necessary 
to introduce it as the solution of the important, though very simple equation 
(5.152). More precisely, it would have sufficed to introduce the function that 
is the solution of Eq. (5.152) for some specific value of a, for example, a = 1; 
for the general equation (5.152) can be reduced to this case by changing to 
a new variable t connected with x by the relation x = Ł, (a #0). 

Indeed, we then have 
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and instead of the equation f'(x) = af(x) we then have aF”(t) = aF (t), or 
F'(t) = F(t). 
Thus, let us consider the equation 


f'(x) = f(z) (5.153) 


and denote the solution of this equation satisfying f(0) = 1 by expz. 

Let us check to see whether this definition agrees with our previous defi- 
nition of exp x. 

Let us try to calculate the value of f(x) starting from the relation 
f(0) = 1 and the assumption that f satisfies (5.153). Since f is differen- 
tiable, it is continuous. But then Eq. (5.153) implies that f'(x) is also con- 
tinuous. Moreover, it follows from (5.153) that f also has a second derivative 
f(x) = f'(x), and in general that f is infinitely differentiable. Since the rate 
of variation f'(x) of the function f(x) is continuous, the function f’ changes 
very little over a small interval h of variation of its argument. Therefore 
f(to +h) = f(xo) + f’(E)h ~ f(xo) + f’(xo)h. Let us use this approximate 
formula and traverse the interval from 0 to x in small steps of size h = £, 
where n EN. If x9 = 0 and £k41 = £k +h, we should have 


f (2k41) © f (te) + f'(£k)h. 
Taking account of (5.153) and the condition f(0) = 1, we have 


f(x) = flan) © f(@n-1) + 7 (Gah = 
= f(tn—1)(1 +h) © (f(En-2) + f'(En-2)h)(1 + h) = 
= f(tn_2)(1 +h)? +--+ ~ f(xo)(1 +h)” = 


Lv n 

= f(0)(1 +A)” = (1+ =) 
It seems natural (and this can be proved) that the smaller the step h = £, 
the closer the approximation in the formula f(x) ~ (1 + zy”, 

Thus we arrive at the conclusion that 

f(z) = lim (1 + =) | 
n—> CO ` n 

In particular, if we denote the quantity f(1) = lim (1 + i)" by e and 

show that e Æ 1, we shall have obtained 


; T\” _ E ft? Lie) © =e 
f(x) = lim (1+ =| = lim(1 + t)” = lim [(1+4)/*]" =e", (5.154) 
since we know that u% —> v® if u > v. 

This method of solving Eq. (5.153) numerically, which enabled us to ob- 
tain formula (5.154), was proposed by Euler long ago, and is called Euler’s 
polygonal method. This name is connected with the fact that the computa- 
tions carried out in it have a geometric interpretation as the replacement 
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of the solution f(x) of the equation (or rather its graph) by an approx- 
imating graph consisting of a broken line whose links on the correspond- 
ing closed intervals [£k, 2441] (k = 0,...,2 — 1) are given by the equations 
y = f(£k) + f' (xe) (x — zp) (see Fig. 5.34). 

We have also encountered the definition of the function exp x as the sum 


CO 
of the power series ) 5x". This definition can also be reached from Eq. 


n=0 
(5.153) by using the following frequently-applied device, called the method of 
undetermined coefficients. We seek a solution of Eq. (5.153) as the sum of a 
power series 

fla) = co + cix +: tent” +, (5.155) 


whose coefficients are to be determined. 

As we have seen (Theorem 1 of Sect. 5.5) Eq. (5.155) implies that c, = 
Zo But, by (5.153), f(0) = f’(0) =--- = f™(0) =---, and since f(0) = 
1, we have cn = 1, that is, if the solution has the form (5.155) and f(0) = 1, 
then necessarily | 


f(x) =lt pet atte boat pe 

We could have verified independently that the function defined by this 
series is indeed differentiable (and not only at x = 0) and that it satisfies Eq. 
(5.153) and the initial condition f(0) = 1. However, we shall not linger over 
this point, since our purpose was only to find out whether the introduction 
of the exponential function as the solution of Eq. (5.153) with the initial 
condition f(0) = 1 was in agreement with what we had previously meant by 
the function exp x. 

We remark that Eq. (5.153) could have been studied in the complex plane, 
that is, we could have regarded x as an arbitrary complex number. When this 
is done, the reasoning we have carried out remains valid, although some of 
the geometric intuitiveness of Euler’s method may be lost. 
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Thus it is natural to expect that the function 


oo 1 1 5 i 
e SR ape or ee sae +... 


is the unique solution of the equation 
f'(z) = f(z) 


satisfying the condition f(0) = 1. 


5.6.6 Oscillations 


If a body suspended from a spring is displaced from its equilibrium position, 
for example by lifting it and then dropping it, it will oscillate about its 
equilibrium position. Let us describe this process in its general form. 

Suppose it is known that a force is acting on a point mass m that is free 
to move along the x-axis, and that the force F = —kz is proportional’! to 
the displacement of the point from the origin. Suppose also that we know the 
initial position xp = x(0) of the point mass and its initial velocity vp = (0). 
Let us find the dependence x = z(t) of the position of the point on time. 

By Newton’s law, this problem can be rewritten in the following purely 
mathematical form: Solve the equation 


mz(t) = —kx(t) (5.156) 


under the initial conditions ro = x(0), £(0) = vo. 
Let us rewrite Eq. (5.156) as 


i k 
x(t) + —x(t) =0 (5.157) 
m 
and again try to make use of the exponential. Specifically, let us try to choose 


the number A so that the function x(t) = e% satisfies Eq. (5.157). 
Making the substitution x(t) = e% in (5.157), we obtain 


(a? + Zye =0, 


or i 
X+ =O, (5.158) 

m 
that is, A; = —4/ —£, AQ = —-. Since m > 0, we have the two imaginary 


numbers Ay = —i4/ E, Az = i4/ E when k > 0. We had not reckoned on this 


31 In the case of a spring, the coefficient k > 0 characterizing its stiffness is called 
the modulus. 
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possibility; however, let us continue our study. By Euler’s formula 


; k k 
eV cos /—t — isin 4/—t, 

m m 

lk Lk 
eiVk/mt — cog ,/—t+isins,/—t. 

m m 


Since differentiating with respect to the real variable t amounts to differ- 
entiating the real and imaginary parts of the function eò% separately, Eq. 
(5.157) must be satisfied by both functions cos , / Et and sin 4/ Et. And this 


is indeed the case, as one can easily verify directly. Thus the complex ex- 
ponential function has enabled us to guess two solutions of Eq. (5.157), any 
linear combination of which 


k k 
z(t) = cı cos 4/ —t + c2 sin 4/ —t , (5.159) 
m m 


is obviously also a solution of Eq. (5.157). 


We choose the coefficients cı and c2 in (5.159) from the condition 
i k. k k k 
vo = (0) = | — c14/ — sin 4/ —t + c24/ — cos 4/ —t 
m m m m 


|k 
= COA] — . 
t=0 m 
Thus the function 


k m. k 
z(t) = zo cos 4/ mt + voq / q sin \/ mt (5.160) 


is the required solution. 
By making standard transformations we can rewrite (5.160) in the form 


k 
x(t) = [28 +08 sin (a+) , (5.161) 
To 


/ 2 2m 
LO HVE 


Thus, for k > 0 the point will make periodic oscillations with period 


T = 2r,/#, that is, with frequency 4 = + Jz , and amplitude ,/a% + v§ 7%. 
We state this because it is clear from physical considerations that the solution 
(5.160) is unique. (See Problem 5 at the end of this section.) 

The motion described by (5.161) is called a simple harmonic oscillation, 


and Eq. (5.157) the equation of a simple harmonic oscillator. 


‘tS 20) Ser; 


where 
qa = arcsin 
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Let us now turn to the case when k < 0 in Eq. (5.158). Then the two 


functions e*1* = exp ( — ,/—*t) and e*? = exp (,/— Ët) are real-valued 
solutions of Eq. (5.157) and the function 
a(t) = c,e*1* + ce??? (5.162) 


is also a solution. We choose the constants cı and c from the conditions 


to = 20) = Cj + C2, 


Vo = (0) = (A, + C2A2 . 


This system of linear equations always has a unique solution, since its 
determinant ə — Aj is not 0. 

Since the numbers 4; and Ag are of opposite sign, it can be seen from 
(5.162) that for k < 0 the force F = —kz not only has no tendency to restore 
the point to its equilibrium position at x = 0, but in fact as time goes on, 
carries it an unlimited distance away from this position if £o or vo is nonzero. 
That is, in this case x = 0 is a point of unstable equilibrium. | 

In conclusion let us consider a very natural modification of Eq. (5.156), 
in which the usefulness of the exponential function and Euler’s formula con- 
necting the basic elementary functions shows up even more clearly. 

Let us assume that the particle we are considering moves in a medium (the 
air or a liquid) whose resistance cannot be neglected. Suppose the resisting 
force is proportional to the velocity of the point. Then, instead of Eq. (5.156) 
we must write 

| mi(t) = —axz(t) — ke(t) , 


which we rewrite as 
z(t) + 2 c(t) + i x(t) = 0 (5.163) 
m m 


At 


If once again we seek a solution of the form x(t) = e^, we arrive at the 


quadratic equation 
> a k 
M+ —A+ — 
m m 
whose roots are À1,2 = — z% + vermin | 
The case when a? — 4mk > 0 leads to two real roots A; and àz, and the 
solution can be found in the form (5.162). 
We shall study in more detail the case in which we are more interested, 
when a? — 4mk < 0. Then both roots \; and àz are complex, but not purely 
imaginary: 


=(, 


ee e a 
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In this case Euler’s formula yields 


eùt — exp ( — Z4) (coswt — isin wt) , 
m 
eò™2t — exp ( — Zt) (coswt +isinwt) , 
m 
where w = Mame Thus we find the two real-valued solutions 
exp ( — =) coswt and exp ( — =~) sinwt of Eq. (5.163), which would have 


been very difficult to guess. We then seek a solution of the original equation 
in the form of a linear combination of these two 


x(t) = exp ( — =—t] (cı coswt + co sinwt) , (5.164) 


choosing cı and c2 so that the initial conditions x(0) = xp and £(0) = vo are 
satisfied. 

The system of linear equations that results, as one can verify, always has 
a unique solution. Thus, after transformations, we obtain the solution of the 
problem from (5.164) in the form 


x(t) = Aexp ( — =—t) sin(wt +a), (5.165) 


where A and a are constants determined by the initial conditions. 
It can be seen from this formula that, because of the factor exp ( — st), 
when a > 0 and m > Q, the oscillations will be damped and the rate of 
a 


damping of the amplitude depends on the ratio =. The frequency of the 


ages oy, dees lee) I, A 
oscillations =w = zN m Ce 


also depends only on the ratios E and &, which, however, could have been 
foreseen from the form (5.163) of the original equation. When a = 0, we 
again return to undamped harmonic oscillations (5.161) and Eq. (5.157). 


1 will not vary over time. The quantity w 


5.6.7 Problems and Exercises 


1. Efficiency in rocket propulsion. 


a) Let Q be the chemical energy of a unit mass of rocket fuel and w the outflow 
speed of the fuel. Then lw? is the kinetic energy of a unit mass of fuel when ejected. 
The coefficient œ in the equation ly? = &Q is the efficiency of the processes of 
burning and outflow of the fuel. For solid fuel (smokeless powder) w = 2km/s 
and Q = 1000 kcal/kg, and for liquid fuel (gasoline with oxygen) w = 3 km/s and 
Q = 2500 kcal/kg. Determine the efficiency a for these cases. 


b) The efficiency of a rocket is defined as the ratio of its final kinetic energy 


MR x to the chemical energy of the fuel burned mF Q. Using formula (5.139), obtain 
a formula for the efficiency of a rocket in terms of mr, mr, Q, and a (see part a)). 


c) Evaluate the efficiency of an automobile with a liquid-fuel jet engine, if the 
automobile is accelerated to the usual city speed limit of 60 km/h. 
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d) Evaluate the efficiency of a liquid-fuel rocket carrying a satellite into low 
orbit around the earth. 


e) Determine the final speed for which rocket propulsion using liquid fuel is 
maximally efficient. 


f) Which ratio of masses mr/mp yields the highest possible efficiency for any 
kind of fuel? 


2. The barometric formula. 


a) Using the data from Subsect. 5.6.2, obtain a formula for a correction term to 
take account of the dependence of pressure on the temperature of the air column, 
if the temperature is subject to variation (for example, seasonal) within the range 
+40° C. 


b) Use formula (5.144) to determine the dependence of pressure on elevation 
at temperatures of —40° C, 0° C, and 40° C, and compare these results with the 
results given by your approximate formula from part a). 


c) Suppose the air temperature in the column varies with elevation according 
to the law T’(h) = —aTo, where To is the air temperature at the surface of the 
earth and a = 7-107’ cm~?. Derive a formula for the dependence of pressure on 
elevation under these conditions. 


d) Find the pressure in a mine shaft at depths of 1km, 3km and 9km using 
formula (5.144) and the formula that you obtained in c). 


e) Independently of altitude, air consists of approximately 1/5 oxygen. The 
partial pressure of oxygen is also approximately 1/5 of the air pressure. A certain 
species of fish can live under a partial pressure of oxygen not less than 0.15 atmo- 
spheres. Should one expect to find this species in a river at sea level? Could it be 
found in a river emptying into Lake Titicaca at an elevation of 3.81 km? 


3. Radioactive decay. 


a) By measuring the amount of a radioactive substance and its decay products in 
ore samples of the Earth, assuming that no decay products were originally present, 
one can estimate the age of the Earth (at least from the time when the substance 
appeared). Suppose that in a rock there are m grams of a radioactive substance 
and r grams of its decay product. Knowing the half-life T of the substance, find 
the time elapsed since the decay began and the amount of radioactive substance in 
a sample of the same volume at the initial time. 


b) Atoms of radium in an ore constitute approximately 1071? of the total num- 
ber of atoms. What was the radium content 10°, 10°, and 5- 10° years ago? (The 
age of the Earth is estimated at 5 - 10° years.) 


c) In the diagnosis of kidney diseases one often measures the ability of the 
kidneys to remove from the blood various substances deliberately introduced into 
the body, for example creatin (the “clearance test”). An example of an opposite 
process of the same type is the restoration of the concentration of hemoglobin in 
the blood of a donor or of a patient who has suddenly lost a large amount of 
blood. In all these cases the decrease in the quantity of the substance introduced 
(or, conversely, the restoration of an insufficient quantity) is subject to the law 
N = Noe “/, where N is the amount (in other words, the number of molecules) of 
the substance remaining in the body after time t has elapsed from the introduction 
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of the amount No and 7 is the so-called lifetime: the time elapsed when 1/e of the 
quantity originally introduced remains in the body. The lifetime, as one can easily 
verify, is 1.44 times larger than the half-life, which is the time elapsed when half of 
the original quantity of the substance remains. 

Suppose a radioactive substance leaves the body at a rate characterized by the 
lifetime To, and at the same time decays spontaneously with lifetime Ta. Show that 
in this case the lifetime 7 characterizing the time the substance remains in the body 
is determined by the relation T7! = 79+ +7)". 


d) A certain quantity of blood containing 201 mg of iron has been taken from a 
donor. To make up for this loss of iron, the donor was ordered to take iron sulfate 
tablets three times a day for a week, each tablet containing 67 mg of iron. The 
amount of iron in the donor’s blood returns to normal according to an exponential 
law with lifetime equal to approximately seven days. Assuming that the iron from 
the tablets enters the bloodstream most rapidly immediately after the blood is 
taken, determine approximately the portion of the iron in the tablets that will 
enter the blood over the time needed to restore the normal iron content in the 
blood. 


e) A certain quantity of radioactive phosphorus P? was administered to diag- 
nose a patient with a malignant tumor, after which the radioactivity of the skin 
of the thigh was measured at regular time intervals. The decrease in radioactivity 
was subject to an exponential law. Since the half-life of phosphorus is known to be 
14.3 days, it was possible to use the data thus obtained to determine the lifetime 
for the process of decreasing radioactivity as a result of biological causes. Find this 
constant if it has been established by observation that the lifetime for the overall 
decrease in radioactivity was 9.4 days (see part c) above). 


4. Absorption of radiation. The passage of radiation through a medium is accom- 
panied by partial absorption of the radiation. In many cases (the linear theory) one 
can assume that the absorption in passing through a layer two units thick is the 
same as the absorption in successively passing through two layers, each one unit 
thick. 


a) Show that under this condition the absorption of radiation is subject to the 
law I = Ioe™*!, where Io is the intensity of the radiation falling on the absorbing 
_ substance, J is the intensity after passing through a layer of thickness l, and k is a 
- coefficient having the physical dimension inverse to length. 


b) In the case of absorption of light by water, the coefficient k depends on 
the wave length of the incident light, for example as follows: for ultraviolet k = 
1.4: 107? cm7}; for blue k = 4.6-10~*cm™?; for green k = 4.4 - 1074 cm7}; for red 
k = 2.9 . 107? cm™+}. Sunlight is falling vertically on the surface of a pure lake 10 
meters deep. Compare the intensities of these components of sunlight listed above 
the surface of the lake and at the bottom. 


5. Show that if the law of motion of a point x = x(t) satisfies the equation më + 
kx = 0 for harmonic oscillations, then 


. _ mé#?(t) kz? (t) ; _ ; 
a) the quantity E = —;~ + —;~ is constant (E = K + U is the sum of 


-2 2 
the kinetic energy K = ™£ of the point and its potential energy U = “2 at 
g 5 8 2 


time t); 
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b) if x(0) = 0 and «(0) = 0, then x(t) = 0; 

c) there exists a unique motion x = x(t) with initial conditions x(0) = zo and 
£(0) = vo. 

d) Verify that if the point moves in a medium with friction and x = z(t) satisfies 
the equation më + at + kx = 0, a > 0, then the quantity E (see part a)) decreases. 
Find its rate of decrease and explain the physical meaning of the result, taking 
account of the physical meaning of E. 


6. Motion under the action of a Hooke** central force (the plane oscillator). 

To develop Eq. (5.156) for a linear oscillator in Subsect. 5.6.6 and in Problem 
5 let us consider the equation mr(t) = —kr(t) satisfied by the radius-vector r(t) 
of a point of mass m moving in space under the attraction of a centripetal force 
proportional to the distance |r(t)| from the center with constant of proportionality 
(modulus) k > 0. Such a force arises if the point is joined to the center by a Hooke 
elastic connection, for example, a spring with constant k. 


a) By differentiating the vector product r(t) x r(t), show that the motion takes 
place in the plane passing through the center and containing the initial position 
vector ro = r(to) and the initial velocity vector fo = r(to) (a plane oscillator). If 
the vectors ro = r(to) and ro = r(to) are collinear, the motion takes place along 
the line containing the center and the vector ro (the linear oscillator considered in 
Subsect. 5.6.6). 


b) Verify that the orbit of a plane oscillator is an ellipse and that the motion is 
periodic. Find the period of revolution. 


c) Show that the quantity E = mt? (t) + kr?(t) is conserved (constant in time). 


d) Show that the initial data ro = r(to) and ro = r(to) completely determine 
the subsequent motion of the point. 


7. LEllipticity of planetary orbits. 


The preceding problem makes it possible to regard the motion of a point under 
the action of a central Hooke force as taking place in a plane. Suppose this plane is 
the plane of the complex variable z = x+iy. The motion is determined by two real- 
valued functions x = x(t), y = y(t) or, what is the same, by one complex-valued 
function z = z(t) of time t. Assuming for simplicity in Problem 6 that m = 1 and 
k = 1, consider the simplest form of the equation of such motion z(t) = —z(t). 


a) Knowing from Problem 6 that the solution of this equation corresponding 
to the specific initial data zo = z(to), žo = ż(to) is unique, find it in the form 
z(t) = ce +c2e7* and, using Euler’s formula, verify once again that the trajectory 
of motion is an ellipse with center at zero. (In certain cases it may become a circle 
or degenerate into a line segment — determine when.) 


b) Taking account of the invariance of the quantity |2(t)|? + |z(t)|? during the 
motion of a point z(t) subject to the equation z(t) = —2(t), verify that, in terms 


32 R. Hooke (1635-1703) — British scientist, a versatile scholar and experimenter. 
He discovered the cell structure of tissues and introduced the word cell. He was 
one of the founders of the mathematical theory of elasticity and the wave theory 
of light; he stated the hypothesis of gravitation and the inverse-square law for 
gravitational interaction. 
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of a new (time) parameter 7 connected with t by a relation T = T(t) such that 
et = |2(t)|?, the point w(t) = z*(t) moves subject to the equation dw = —C p> 
where c is a constant and w = w(t(r)) . Thus motion in a central Hooke force field 
and motion in a Newtonian gravitational field turn out to be connected. 

c) Compare this with the result of Problem 8 of Sect. 5.5 and prove that plan- 
etary orbits are ellipses. 


d) If you have access to a computer, looking again at Euler’s method, explained 
in Subsect. 5.6.5, first compute several values of e” using this method. (Observe that 
this method uses nothing except the definition of the differential, more precisely 
the formula f(zn) © f(an-1) + f’(an-1)h, where h = £n — tn-1.) 


Now let r(t) = (x(t), y), ro = r(0) = (1,0), to = #(0) = (0,1) and #(t) = 


— ae Using the formulas 


r(tn) N r(tn-1) + v(tn—1)h s 
v(tn) S v(tn—-1) ta(tn-i)h, 


where v(t) = r(t), a(t) = v(t) = F(t), use Euler’s method to compute the trajectory 
of the point. Observe its shape and how it is traversed by a point as time passes. 


5.7 Primitives 


In differential calculus, as we have verified on the examples of the previous 
section, in addition to knowing how to differentiate functions and write re- 
lations between their derivatives, it is also very valuable to know how to 
find functions from relations satisfied by their derivatives. The simplest such 
problem, but, as will be seen below, a very important one, is the problem 
of finding a function F(x) knowing its derivative F”(x) = f(x). The present 
section is devoted to an introductory discussion of that problem. 


. 5.7.1 The Primitive and the Indefinite Integral 


Definition 1. A function F(x) is a primitive of a function f(x) on an 
interval if F is differentiable on the interval and satisfies the equation 
F'(x) = f(x), or, what is the same, dF (x) = f(x) dz. 


Example 1. The function F(x) = arctan z is a primitive of f(x) = a on 
the entire real line, since arctan’ x = rr. 


Example 2. The function F(x) = arccot = is a primitive of f(x) = ao on 
the set of positive real numbers and on the set of negative real numbers, since 
for x #0 
1 1 1 
F(a) =-—3-(-3) =g =O. 
ee 
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What is the situation in regard to the existence of a primitive, and what 
is the set of primitives of a given function? 

In the integral calculus we shall prove the fundamental fact that every 
function that is continuous on an interval has a primitive on that interval. 

We present this fact for the reader’s information, but in the present section 
we shall essentially use only the following characteristic of the set of primitives 
of a given function on an interval, already known to us (see Subsect. 5.3.1) 
from Lagrange’s theorem. 


Proposition 1. If F,(x) and F2(x) are two primitives of f(x) on the same 
interval, then the difference F(x) — Fo(x) is constant on that interval. 


The hypothesis that F; and F> are being compared on a connected interval 
is essential, as was pointed out in the proof of this proposition. One can 
also see this by comparing Examples 1 and 2, in which the derivatives of 
F(x) = arctan and F(x) = arccot = agree on the entire domain R \ 0 that 
they have in common. However, 


1 
F; (x) — Fo(x) = arctan x — arccot — = arctan x — arctanz =0, 
£ 


for x > 0 while F(x) — F(x) = —r for x < 0. For if x < 0, we have 
arccot } = mq + arctan z. 

Like the operation of taking the differential, which has the name “differ- 
entiation” and the mathematical notation dF (x) = F’(x) dz, the operation 
of finding a primitive has the name “indefinite integration” and the mathe- 
matical notation 


J Hoan (5.166) 


called the indefinite integral of f(x) on the given interval. 

Thus we shall interpret the expression (5.166) as a notation for any of the 
primitives of f on the interval in question. 

In the notation (5.166) the sign f is called the indefinite integral sign, f 
is called the integrand, and f(x) dz is called a differential form. 

It follows from Proposition 1 that if F(x) is any particular primitive of 
f(x) on the interval, then on that interval 


J AE EE (5.167) 


that is, any other primitive can be obtained from the particular primitive 
F(x) by adding a constant. 

If F’(x) = f(x), that is, F is a primitive of f on some interval, then by 
(5.167) we have 


a | f(a) dz = dF (x) = F’(x) dz = f(x) dz. (5.168) 
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Moreover, in accordance with the concept of an indefinite integral as any 
primitive, it also follows from (5.167) that 


/ ajs / Peas rayee (5.169) 


Formulas (5.168) and (5.169) establish a reciprocity between the opera- 
tions of differentiation and indefinite integration. These operations are mu- 
tually inverse up to the undetermined constant C that appears in (5.169). 

Up to this point we have discussed only the mathematical nature of the 
constant C in (5.167). We now give its physical meaning using a simple 
example. Suppose a point is moving along a line in such a way that its 
velocity v(t) is known as a function of time (for example, v(t) = v). If x(t) is 
the coordinate of the point at time t, the function x(t) satisfies the equation 
z(t) = v(t), that is, x(t) is a primitive of v(t). Can the position of a point on 
a line be recovered knowing its velocity over a certain time interval? Clearly 
not. From the velocity and the time interval one can determine the length 
s of the path traversed during this time, but not the position on the line. 
However, the position will also be completely determined if it is given at 
even one instant, for example, t = 0, that is, we give the initial condition 
z(0) = xo. Until the initial condition is given, the law of motion could be 
any law of the form x(t) = z(t) + c, where z(t) is any particular primitive 
of v(t) and c is an arbitrary constant. But once the initial condition z(0) = 
(0) + c = xo is given, all the indeterminacy disappears; for we must have 
x(0) = £(0)+c = z0, that is, c = xo — £(0) and z(t) = zo + [z(t) — Z(0)]. This 
last formula is entirely physical, since the arbitrary primitive % appears in it 
only as the difference that determines the path traversed or the magnitude 
of the displacement from the known initial point r(0) = Zo. 


5.7.2 The Basic General Methods of Finding a Primitive 


In accordance with the definition of the expression (5.166) for the indefinite 
integral, this expression denotes a function whose derivative is the integrand. 
From this definition, taking account of (5.167) and the laws of differentiation, 
one can assert that the following relations hold: 


a. J (au(x) + Bv(z)) dz = a | ula) dz + 6 | o(a) dz +c. (5.170) 
b. OK: = J uewe) dz + [x@r'@ dr +c. (5.171) 
c. If 


| #@) dz = F(x)+c 
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on an interval I, and p: I; > I, is a smooth (continuously differentiable) 
mapping of the interval I, into Iz, then 


J (fow)(ty'(t)dt = (Foy)(t) +c. (5.172) 


The equalities (5.170), (5.171), and (5.172) can be verified by differenti- 
ating the left- and right-hand sides using the linearity of differentiation in 
(5.170), the rule for differentiating a product in (5.171), and the rule for 
differentiating a composite function in (5.172). 

Just like the rules for differentiation, which make it possible to differen- 
tiate linear combinations, products, and compositions of known functions, 
relations (5.170), (5.171), and (5.172), as we shall see, make it possible in 
many cases to reduce the search for a primitive of a function either to the 
construction of primitives for simpler functions or to primitives that are al- 
ready known. A set of such known primitives can be provided, for example, 
by the following short table of indefinite integrals, obtained by rewriting the 
table of derivatives of the basic elementary functions (see Subsect. 5.2.3): 


1 
[ova = Raq? ee (a 4-1), 


1 
[ow = In|z|+c, 
x 
fean = ewer (0<aF1l) 
~ Ina 
eva Se re. 


[sincas = — Cost +c, 


f coszar = sing +c, 


1 
J 5 dr = tanz +c, 
COS* x 


1 
l= dx = —cotr+c, 
sinl x 


arcsinz+c, 


1 
——— dr = 
J- 1 — z? 


1 arctanz+c, 
J 5 dx = - 
EFt —arccot r + Č, 


—arccosr+c, 
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f sinhzaa = coshz+c, 


[cosn de = sinhz +c, 


1 
dx = tanhz+c, 
l= 


1 
dx = —cothz+c, 
las 
1 
——. dr = lanjer + yr? 1| +c, 
| pee | | 


1 1. jlt+a 
Hass halts 
eer. sie en | 1 a 


Each of these formulas is used on the intervals of the real line R on which 
the corresponding integrand is defined. If more than one such interval exists, 
the constant c on the right-hand side may change from one interval to another. 

Let us now consider some examples that show relations (5.170), (5.171) 
and (5.172) in action. We begin with a preliminary remark. 

Given that, once a primitive has been found for a given function on an 
interval the other primitives can be found by adding constants, we shall agree 
to save writing below by adding the arbitrary constant only to the final result, 
which is a particular primitive of the given function. 


a. Linearity of the Indefinite Integral This heading means that by rela- 
tion (5.170) the primitive of a linear combination of functions can be found 
as the same linear combination of the primitives of the functions. 


Example 3. 


[loo + a1 +--+ aq0") dz = 


=a | 1dz+o | rdo+--+an fo" dx = 


at 1 2 1 n+l 
RERNA ee + Bea et i 


Example 4. 


[ (e+) a= f (P +2ve+ 5) ae = 


1 1 4 
= farde+2 foVPar+ | az= $a + 50° + Imal +e. 
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Example 5. 


[cost Fae = | 5 (1+ 0080) de = 5 | (1 +0082) dr = 


1 1 1 1, 
=} [rdct 5 | coszds= 32+ Fsing +e. 


b. Integration by Parts Formula (5.171) can be rewritten as 


aise J iadaa J MOET. 


or, what is the same, as 
J ida) e E / ia Ae (5.171') 


This means that in seeking a primitive for the function u(xz)v’(x) one 
can reduce the problem to finding a primitive for v(x)u’(x), throwing the 
differentiation onto the other factor and partially integrating the function, 
as shown in (5.171), separating the term u(x)v(x) when doing so. Formula 
(5.171’) is called the formula for integration by parts. 


Example 6. 


| mzas=sms- f sdms=zms- [2-2 ae= 


=zmz- | 1de=zine—z+e. 


Example 7. 


[oredr = | at = ae — | Eda? = 2-2 | aet ax = 
= ae? —2 | zde" = ate" ~2(ae" - fede) = 


= g*e" — Ire + 2e7 +c = (x? — 2g +2) +c. 


c. Change of Variable in an Indefinite Integral Formula (5.172) shows 
that in seeking a primitive for the function (f o y)(t) - y’(t) one may proceed 
as follows: 


J (F o p)(t) - p' (t) dt = J f (olt) dy(t) = 
= fte) dz = F(z) +c= F(p(t)) +c, 


that is, first make the change of variable y(t) = x in the integrand and pass 
to the new variable x, then, after finding the primitive as a function of x, 
return to the old variable t by the substitution x = y(t). 
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Example 8. 


tdt - d(t? + 1) dx 1 IE 
= =o a = —In(t? +1 
lo fae 14t =; a ACS e 


Example 9. 


J dx ai dx = da(2) 7 

sing 2sin 5 cos 5 E tan $ cos? $ a 
fl du ee 7 J dv _ 
= | tanucos2u _ tanu vo 


£ 
=In|v| +¢= In| tanul +¢=In|tan=| +c. 


We have now considered several examples in which properties a, b, and c 
of the indefinite integral have been used individually. Actually, in the majority 
of cases, these properties are used together. 


Example 10. 


1 
[sm 2x cos 32 dz = 5 [in 5x — sin z) dz = 


= apie i! = z (5 [ sinszd(6e) + cosa) = 


1 1 
= | sinudu+ 5 cosa = -75 cosa + z COST +c = 


= — COST : cos 5x +c 
9 10 
Example 11. 


J arcsin x dx = gx arcsin zx — J x darcsin x = 


d(1 — 
da = warcsin + 5 ast) wm) _ = 


x 
= garcsin £ — 
lz V1 — r? 


1 
= gz arcsin xz + 5 fa = zarcsin z + u!/? +c= 


= garcsinz+ V/l—2z?+4+c. 


Example 12. 


J e°? cos ba dr = = / cos bx de®” = 


1 1 1 b 
= —e* cos br — — [et acos bx = —e* cos ba + — Je sin bz dz = 
a a a a 


b 1 b 
~e* cos ba + 5 [sv bx de®” = e°? cos ba + oe" sin bx — 
a a a 


b acosbr+bsinbr b? 
= ef? dsin bx = ee ef? cos ba dz . 
a a a 3 
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From this result we conclude that 


acos bx + bsin bx 
a2 + b? 


ax 


J e°? cos bz dz = 


We could have arrived at this result by using Euler’s formula and the fact 
that the primitive of the function e(@+!)* = e97 cos br + ie® sin bz is 


1 e(atib)x = a — ib e(atib)x = 
a + ib a? + b? 
_ acos bx + bsin b& ar ¡asing — bcos br 
7 a? + b? a? + b? 


ax 


It will be useful to keep this in mind in the future. For real values of x this 
can easily be verified directly by differentiating the real and imaginary parts 
of the function pete tl. 

In particular, we also find from this result that 


asin bx — bcos bx 
a2 UR b2 


ax 


J e°? sin bz dz = 


Even the small set of examples we have considered suffices to show that 
in seeking primitives for even the elementary functions one is often obliged 
to resort to auxiliary transformations and clever devices, which was not at 
all the case in finding the derivatives of compositions of the functions whose 
derivatives we knew. It turns out that this difficulty is not accidental. For 
example, in contrast to differentiation, finding the primitive of an elementary 
function may lead to a function that is no longer a composition of elementary 
functions. For that reason, one should not conflate the phrase “finding a 
primitive” with the sometimes impossible task of “expressing the primitive 
of a given elementary function in terms of elementary functions”. In general, 
the class of elementary functions is a rather artificial object. There are very 
many special functions of importance in applications that have been studied 
and tabulated at least as well as, say sin x or e7. 

For example, the sine integral Six is the primitive f sinz dz of the function 
sing that tends to zero as x — 0. There exists such a primitve, but, like all 
the other primitives of sing it is not a composition of elementary functions. 

Similarly, the function 


specified by the condition Cix — 0 as x — œ is not elementary. The function 
Ciz is called the cosine integral. 

The primitive f ct of the function -+ is also not elementary. One of 
the primitives of this function is denoted lix and is called the logarithmic 
integral. It satisfies the condition liz —> 0 as x —> +0. (More details about 
the functions Siz, Cix, and liz will be given in Sect. 6.5.) 
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Because of these difficulties in finding primitives, rather extensive tables 
of indefinite integrals have been compiled. However, in order to use these 
tables successfully and avoid having to resort to them when the problem is 
very simple, one must acquire some skill in dealing with indefinite integrals. 

The remainder of this section is devoted to integrating some special classes 
of functions whose primitives can be expressed as compositions of elementary 
functions. 


5.7.3 Primitives of Rational Functions 


Let us consider the problem of integrating f R(x) dz, where R(x) = tz) 1S 
a ratio of polynomials. 

If we work in the domain of real numbers, then, without going outside 
this domain, we can express every such fraction, as we know from algebra 
(see formula (5.135) in Subsect. 5.5.4) as a sum 


ane (Se Boe corn dpb? arnt , (5.173) 


where p(x) is a polynomial (which arises when P(x) is divided by Q(x), but 
only when the degree of P(x) is not less than the degree of Q(x)), ajk, bjk, 
and cję are uniquely determined real numbers, and Q(x) = (x — x1)® --- (£ — 
a)" (a? + pig Ea) +++ (a? + pnt + dn). 

We have already discussed how to find the expansion (5.173) in Sect. 5.5. 
Once the expansion (5.173) has been constructed, integrating R(x) reduces 
to integrating the individual terms. 

We have already integrated a polynomial in Example 1, so that it remains 
only to consider the integration of fractions of the forms 


1 brz +c 


— — herekK EN. 
(z — a)" an CESEN. , where K € 


The first of these problems can be solved immediately, since 


a(z — a) tt +cfork4l1, 


1 
|y (5.174) 
— ak 
(z-a) ln|z -a| +c bork= b; 


br +c 
EREE NT z dx 
(x? + px +q) 


we eases as follows. ne represent the polynomial z? 1 pr +q as (x i 1p)” + 
(q- lp ar where 1 — lp? >0, A wie polynomial x? + px + q has no real 
roots. Setting x + tp = u and q — 4p” = a”, we obtain 


With the integral 
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/ br +c a= f au + B iu 
(z? +per +g) — J (u®+a?)k 


where a = b and £ = c — Sop. 


Next, 
/ u ie 1 d(u? +a?) _ 
(u? + a?)* = 2J (u+) © 


ER (u? +a?) Ft! fork £1, 


= (5.175) 
5 In(u? + a?) fork =1, 
and it remains only to study the integral 
du 
CENE eee ns 5.176 
j | ray = 
Integrating by parts and making elementary transformations, we have 
du u? du 
I. = ———— 2k = 
k J (u2 + a2)k (u2 + a2)k Rt2 loa (u2 + (u2 + a2)k+1 
u i +a“) u 
CETO ESA wea 7 ON a Test 
from which we obtain the recursion relation 
1 u 2k- 1 
Ik} = se o to dk, (5.177) 


2ka? (u? +a?) 2ka? 


which makes it possible to lower the exponent k in the integral (5.176). But 
I, is easy to compute: 


o du — 1 d(#) 1 u 
e a T a g te (5.178) 


Thus, by using (5.177) and (5.178), one can also compute the primitive 
(5.176). 
Thus we have proved the following proposition. 


Proposition 2. The primitive of any rational function R(x) = ae can 
be expressed in terms of rational functions and the transcendental functions 
In and arctan. The rational part of the primitive, when placed over a com- 
mon denominator, will have a denominator containing all the factors of the 
polynomial Q(x) with multiplicities one less than they have in Q(z). 


2 2 
Example 13. Let us calculate a dz. 
(x* — 1)(x + 2) 
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Since the integrand is a proper fraction, and the factorization of the de- 
nominator into the product (x—1)(x+1)(x+2) is also known, we immediately 
seek a partial fraction expansion 


207 +52+5 — A PE 3 
(x—l1)(x2+1)\(2@+2) 2-1 z+1 +27 


(5.179) 


Putting the right-hand side of Eq. (5.179) over a common denominator, 
we have 


2x? + 5x +5 (A+ B+ C)x? + (3A + B)x + (2A — 2B — C) 


(x—1)\(x+1)\(£z+2) (x — 1)(x + 1)(z + 2) 


Equating the corresponding coefficients in the numerators, we obtain the 
system 

A+ B4+C=2, 

3A+ B Spy 

2A- 2B- 6 =), 


from which we find (A, B,C) = (2, —1, 1). 

We remark that in this case these numbers could have been found in 
one’s head. Indeed, multiplying (5.179) by x — 1 and then setting x = 1 in 
the resulting equality, we would have A on the right-hand side, while the 
left-hand side would have been the value at x = 1 of the fraction obtained 
by striking out the factor x — 1 in the denominator, that is, A = 245E =? 
One could proceed similarly to find B and C. 

Thus, 


2x? +52 +5 dx dx dz 
ee a E ee + £ 
(x? — 1)(x + 2) x—1 r+1 r+2 
(x — 1)? (x+ 2) 
x—1 


= 2In |x — 1| — n |e + 1| + m |e + 2| +c = 1n | +c. 


Example 14. Let us compute a primitive of the function 


R(z) = xe! — 2x + 4x5 — 5x4 + 4x3 — 5x? — x 
7 (p= 1) (ae aed) 


We begin by remarking that this is an improper fraction, so that, removing 
the parentheses and finding the denominator Q(x) = zê — 2x5 + 3x4 — 4x3 + 
3x? — 22 + 1, we divide the numerator by it, after which we obtain 


L’ — ot + r? — 3x? — 2r 


A u GG agaaene 


and we then seek a partial-fraction expansion of the proper fraction 


g5 — zt +z? — 3z? -2r A B Cr+D Ex+F 


Ce Pe e e e asd 180) 
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Of course the expansion could be obtained in the canonical way, by writing 
out a system of six equations in six unknowns. However, instead of doing that, 
we shall demonstrate some other technical possibilities that are sometimes 
used. 

We find the coefficient A by multiplying Eq. (5.180) by (x — 1)? and then 
setting x = 1. The result is A = —1. We then transpose the fraction mrt in 
which A is now the known quantity —1, to the left-hand side of Eq. (5.180). 
We then have 


2 3 4 2r? —1 B D Ex+F 
ace IR E R E a (5.181) 
(x — 1)(x? + 1)? x-1 (a#?+1) xr? +1 
from which, multiplying (5.181) by x — 1 and then setting x = 1, we find 
B= 
Now, transposing the fraction to the left-hand side of (5.181), we 
obtain 
r? +r+2 Cr+D n Er + F 
(x2 +1)  (x2?+1) xr? +1 ` 


Now, after putting the right-hand side of (5.182) over a common denom- 
inator, we equate the numerators 


(5.182) 


£? +r +2 = Er’ + Fr? +(C+E)jz+(D++F), 


from which it follows that 


or (C, D,E, F) = (1,1,0,1). 
We now know all the coefficients in (5.180). Upon integration, the first 
two fractions yield respectively + and ln |x — 1|. Then 


Cx + D NA x+1 ie 
(x2 + 1)? (#2 +1)2 i 
1 f d(x? +1) dx —1 
— = on dea oF ee c a "> I 
of Gort! @op man 2 


where 


b= | = 3 ad E rep 
2 J (a2 +1)2 2 (x2 +1)? ` 2 : 


which follows from (5.177) and (5.178). 


Finally, 
Ex + F 1 
[Apes | -z de= arctan a : 
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Gathering all the integrals, we finally have 


l 1 1 £ 3 
R(x) dx = a Aee +1 — 1| + —arct ; 
J (x) dx 52 Soe oe | + 5 arctanz +c 
Let us now consider some frequently encountered indefinite integrals 
whose computation can be reduced to finding the primitive of a rational 


function. 


5.7.4 Primitives of the Form J R(cos x, sin x) dx 


Let R(u,v) be a rational function in u and v, that is a quotient of poly- 
nomials e, which are linear combinations of monomials u™v”, where 
m = 0,1,2... and n = 0,1,.... 

Several methods exist for computing the integral f R(cos x, sin x) dz, one 
of which is completely general, although not always the most efficient. 


a. We make the change of variable t = tan 5. Since 


1 — tan? Z , 2 tan 2 
cos £ = ye) sing = T 
l + tan > 1+ tan 5 
dx 2dt 
= that is, dz = 
2cos2 2 ’ 1 + tan? Z ? 


it follows that 


1- 2t 2 
J Ross, sinz)de= | (i a)r pt, 


and the problem has been reduced to integrating a rational function. 

However, this way leads to a very cumbersome rational function; for that 
reason one should keep in mind that in many cases there are other possibilities 
for rationalizing the integral. 


b. In the case of integrals of the form f R(cos*z,sin*?x)dz or 
f r(tanz)dz, where r(u) is a rational function, a convenient substitution 
is t = tan z, since 


9 1 . 9 tan* x 
Cs 2 = — >. Sl t= 5 5 
l + tan“ x l + tan‘ x 
dz dt 
dt = —;— , that is, dr = —— . 
Cos? x 1 + t2 


Carrying out this substitution, we obtain respectively 
1 2 \ dt 
R TE = 
J Ricos x, sin“ x)dz = | r(e) IJE’ 
dt 
J r(tana) dz = fos : 
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c. In the case of integrals of the form 
J R(cos x, sin? x) sin z dx or J R(cos? x, sin x) cos z dz 


One can move the functions sin x and cos x into the differential and make the 
substitution t = cos x or t = sin x respectively. After these substitutions, the 
integrals will have the form 


- | R&1-P)ät or fra =d 4d) dt : 


Example 15. 


J dz sj 1 dt 
3+sing | 34+ 7b ‘Tae 


=2/ a al d(t + 3) = du 
~~ CTOM TR Gee Oar. ae oI o aa 
3t¢ + 2t+3 3 va) +8 3 u? + (22) 

: arctan 2 +c : arctan —— er +c anra e Was 
em —— — r = EREA 
v2 2/2 V2 2V2 V2 2/2 


Here we have used the universal change of variable t = tan 3. 


Example 16. 


| dx B J dx = 
(sinz+cosz)? J cos?a(tanz+1)? © 


J dtan z dt 1 P 1 
= —— Z> —— = — CSS. O 
(tanz +1)? (t+ 1) t+1 1+tanz 


Example 17. 


dx dx 
/ 2sin? 32 —3cos23r+1 _ / cos? 3x(2 tan? 3x — 3 + (1 + tan? 3z)) E 


-1 f dtan 3x -a _ eas Lae 
= 3J 3taRł3r-2 3 - 342i 


| +e= 


ive 
— u+1 


Example 18. 


[= = xd sin x “Le — t?) dt _ 
—z— dr = 
sin“ z sin’ x 


1 1 1 
t7 — t5) dt = E: popi pg= is 
=j ( ) 6 4 Asintx 6sinf x 


5.7 Primitives 321 


5.7.5 Primitives of the Form [Re y(x)) dx 


Let R(x, y) be, as in Subsect. 5.7.4, a rational function. Let us consider some 
special integrals of the form 


[Re y(x)) dx 


where y = y(x) is a function of zx. 

First of all, it is clear that if one can make a change of variable x = z(t) 
such that both functions x = z(t) and y = y(z(t)) are rational functions of 
t, then x’(t) is also a rational function and 


J Revo) dz = J REO EO) dt , 


that is, the problem will have been reduced to integrating a rational function. 
Consider the following special choices of the function y = y(x). 


a. lIfy= ?% ath, where n € N, then, setting t” = ont" we obtain 
_ d-t®—6 E 
Dee y=t, 


and the integrand rationalizes. 


Example 19. 
pkey p el +1 
[fea = fa PH) =e a [Pa 
t9 +1 2 
_ ag ae 


Pl 
Qe eda 22 ee ae 
ia lamn a 


t 1 2+t 
= rner A Co ae e eee S ae ee di = 
i-B 2f (aaa wat) 


1 3 
— Hating- i [Rt a- 
= we Aea) Pa 


2 
3 
t 2 1 1 3 
= iiss ( ah reece 
jap tg hl t| sinf(t+5) +4 


arctan (t+ =) + here t o 
—— ar — ~) +c, w Sa pa 
V3 V3 2 r+1 


b. Let us now consider the case when y = Vax? + bx + c, that is, integrals 
of the form 


| R(2, Vax? + ba +c) dz. 
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By completing the square in the trinomial az? + bx +c and making a suitable 
linear substitution, we reduce the general case to one of the following three 
simple cases: 


JRE t? +1)dt, f RGE Vt? —1) dt, [re 1 — t?) dt. (5.183) 


To rationalize these integrals it now suffices to make the following substi- 
tutions, respectively: 


Vt? +1l=tut+1, or Vt? +1=tu—-1, or Vt??+1=t-u; 
Vt? -l=u(t—1), or Vt? -L=u(t4+1), or Vt? -l=t-u; 
V1—t=u(1-t), o V1-—# =u(1+t), or V1-#? =tuHtl. 


These substitutions were proposed long ago by Euler (see Problem 3 at 
the end of this section). 

Let us verify, for example, that after the first substitution we will have 
reduced the first integral to the integral of a rational function. 

In fact, if Vt? + 1 = tu + 1, then t? + 1 = t?u? + 2tu+ 1, from which we 


find 
E 2u 
— l-u? 
and then i P 
+u 
£ +1= f 
X 1 — u? 


Thus ¢ and vt? + 1 have been expressed rationally in terms of u, and conse- 
quently the integral has been reduced to the integral of a rational function. 

The integrals (5.183) can also be reduced, by means of the substitutions 
t = sinh ọ, t = coshy, and t = sing (or t = cosg) respectively, to the 
following forms: 


/ R(sinh y, cosh y) cosh y dy, J R(cosh y, sinh p) sinh y dy 


and 
/ R(sin y, cos p) cos p dy or — J R(cos y, sin p) siny dọ . 


Example 20. 


dz dx / dt 
loe oa ln t-1+vVt?4+1- 


Setting Vt? + 1 = u — t, we have 1 = u? — 2tu, from which it follows that 


t= N Therefore 


5.7 Primitives 323 


| —5 -; | z (145) qu=5 / are 
t-1+V?+4+1 2 u— l1 u? 2j u-1 


+5 a erie +5 | ( : > =) du = 
2/ uw(u-1) 2 2J \u-1 «2 u 7 


It now remains to retrace the path of substitutions: u = t+ Vt? + 1 and 
t=2+1. 


c. Elliptic integrals. Another important class of integrals consists of 


those of the form 
[RO VP(z)) dz , (5.184) 


where P(x) is a polynomial of degree n > 2. As Abel and Liouville showed, 
such an integral cannot in general be expressed in terms of elementary func- 
tions. 

For n = 3 and n = 4 the integral (5.184) is called an elliptic integral, and 
for n > 4 it is called hyperelliptic. 

It can be shown that by elementary substitutions the general elliptic in- 
tegral can be reduced to the following three standard forms up to terms 
expressible in elementary functions: 


lacs (5.185) 
x dr 
Jd — 22) kr. (5.186) 
| ae (5.187) 


_where h and k are parameters, the parameter k lying in the interval |0, 1[ in 
all three cases. 

By the substitution x = sin y these integrals can be reduced to the fol- 
lowing canonical integrals and combinations of them: 


dy 
Por (6.188) 


[vi Pa yde , (5.189) 
[| (5.190) 
(1 — hsin? ee, — k? sin? 


The integrals (5.188), (5.189) and (5.190) are called respectively the ellip- 
tic integral of first kind, second kind, and third kind (in the Legendre form). 
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The symbols F'(k,y) and E(k, vy) respectively denote the particular elliptic 
integrals (5.188) and (5.189) of first and second kind that satisfy F'(k,0) = 0 
and E(k,0) = 0. 

The functions F (k, y) and E(k, p) are frequently used, and for that reason 
very detailed tables of their values have been compiled for 0 < k < 1 and 
0<y<7/2. 

As Abel showed, it is natural to study elliptic integrals in the complex 
domain, in intimate connection with the so-called elliptic functions, which 
are related to the elliptic integrals exactly as the function sin x, for example, 
is related to the integral f wea = arcsin ọ. 


5.7.6 Problems and Exercises 


1. Ostrogradskii’s*? method of separating off the rational part of the integral of a 


proper fe, fraction. 


Let ite ; be a proper rational fraction, let q(x) be the polynomial paving the 
same roots as Q(x), but with multiplicity 1, and let Qı (x) = 2., 
Show that 
a) the following formula of Ostrogradskii holds: 
P(x) P, (x) / p(x) 
dr = + | —<dz, 5.191 
Q(@) = Qe) +S aa) ae 


where 5 z and PR are proper rational fractions and f me dz is a transcendental 


function. 
(Because of this result, the fraction ames 


of the integral f aa dz.) 


b) In the formula 


in (5.191) is called the rational part 


P(x) _ (Pr(z)\' , p(z) 
Q(z) ~ (35) + Ge) 


/ 
obtained by differentiating Ostrogradskii’s formula, the fraction (#8) can be 


given the denominator Q(x) after suitable cancellations. 


c) The polynomials q(x), Qi(x), and then also the polynomials p(x), Pı (x) can 
be found algebraically, without even knowing the roots of Q(x). Thus the rational 
part of the integral (5.191) can be found completely without even computing the 
whole primitive. 


d) Separate off the rational part of the integral (5.191) if 
P(x) = 22° + 3x° + 6x* + 6x? + 10x” + 34 +2, 
Q(x) = £” + 30° + 5a° + 72* + Tx? + 5r? 4+ 3241 
(see Example 17 in Sect. 5.5). 


33 M. V. Ostrogradskii (1801-1861) — prominent Russian specialist in theoretical 
mechanics and mathematician, one of the founders of the applied area of research 
in the Petersburg mathematical school. 
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2. Suppose we are seeking the primitive 
J Ricos x,sinx) dz , (5.192) 


where R(u, v) = et is a rational function. 

Show that 

a) if R(—u, v) = R(u,v), then R(u,v) has the form Rı(u?, v); 

b) if R(—u,v) = —R(u,v), then R(u,v) = u- Re(u?,v) and the substitution 
t = sin x rationalizes the integral (5.192); 

c) If R(—u,—v) = R(u,v), then R(u,v) = Ra(#,v?), and the substitution 


t = tan x rationalizes the integral (5.192). 


3. Integrals of the form 
[Re Vax? + ba + c) dz. (5.193) 


a) Verify that the integral (5.193) can be reduced to the integral of a rational 
function by the following Euler substitutions: 


t= Vax? +bxr+cxt Vaz, if a > 0, 


t=] ae if x2 and 22 are real roots of the trinomial ax? + br + c. 


T 

b) Let (xo, yo) be a point of the curve y? = az? + bx + c and t the slope of the 
line passing through (xo, yo) and intersecting this curve in the point (x, y). Express 
the coordinates (x,y) in terms of (xo, yo) and t and connect these formulas with 
Euler’s substitutions. 

c) A curve defined by an algebraic equation P(x,y) = 0 is unicursal if it admits 
a parametric description x = x(t), y = y(t) in terms of rational functions x(t) and 
y(t). Show that the integral f R(x, y(x)) dz, where R(u,v) is a rational function 
and y(x) is an algebraic function satisfying the equation P(x,y) = 0 that defines 
the unicursal curve, can be reduced to the integral of a rational function. 

d) Show that the integral (5.193) can always be reduced to computing integrals 
of the following three types: 


P(x) iz dx 
Vax? + br +c (x — x0)* - Vax? +br +c’ 
(Ax + B) dx 
|e T 


4. a) Show that the integral 
fa + ba")? dx 


whose differential is a binomial, where m, n, and p are rational numbers, can be 
reduced to the integral 


I (a + bt)?¢? dt , (5.194) 


where p and q are rational numbers. 
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b) The integral (5.194) can be expressed in terms of elementary functions if one 
of the three numbers p, q, and p+ q is an integer. (Chebyshev showed that there 
were no other cases in which the integral (5.194) could be expressed in elementary 
functions.) 


5. Elliptic integrals. 


a) Any polynomial of degree three with real coefficients has a real root xo, and 
can be reduced to a polynomial of the form t° (att + bt? + ct? +dt +e), where a £ 0, 
by the substitution z — zo = t°. 


b) If R(u,v) is a rational function and P a polynomial of degree 3 or 4, the 
function R(x, VP) can be reduced to the form Rı (t, vatt + bt? +---+ e), 
where a Æ 0. 


c) A fourth-degree polynomial axt + bz? + ---+ e can be represented as a 
product a(x? + pix + qi)(x? + pox + q2) and can always be brought into the form 


M1 +N1t?)(M2+N2t? SRT 
Cnt ee) by a substitution z = aite ; 


d) A function R(x, Vax* + br? +--+ e) can be reduced to the form 
Ri (t, VAC + mit?)(1 + mat?) ) 


at+ß 
yt+1° 

e) A function R(x, ,/y) can be represented as a sum Ri (x,y) + Faw), where 
Rı and Rz are rational functions. 


by a substitution x = 


f) Any rational function can be represented as the sum of even and odd rational 
functions. 


g) If the rational function R(x) is even, it has the form r(x”); if odd, it has the 
form xr(x*), where r(x) is a rational function. 


h) Any function R(x, ,/y) can be reduced to the form 


2 2 
Ro(x*, y) n R3(x Y) a 


vY vY 


i) Up to a sum of elementary terms, any integral of the form f R(x, VP@)) dx 
where P(x) is a polynomial of degree four, can be reduced to an integral 


Ri(z,y) + 


| r(t?) dt 
/A(1 + mit?)(1 + mat?) ” 
where r(t) is a rational function and A = +1. 


j) If [mi| > |me| > 0, one i the substitutions ,/mit = z, /mit = V1—-2?, 
er CD: 


vmit = Jae and ,/mit = TF will reduce the integral f NECAT 


2 
to the form f Aap BS where 0 < k < 1 and F is a rational function. 


4/(1—22)(1—k2 x2)’ 
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k) Derive a formula for lowering the exponents 2n and m for the integrals 


| x?” dx dx 
(1 — 2?)(1 — kx?) ’ (x2 — a)™ - \/(1 = &?)(1 — k?z?) © 
1) Any elliptic integral 
[aCe VP) dz, 


where P is a fourth-degree polynomial, can be reduced to one of the canonical forms 
(5.185), (5.186), (5.187), up to a sum of terms consisting of elementary functions. 
dg 
1+23 


m) Express the integral f in terms of canonical elliptic integrals. 


n) Express the primitives of the functions TE and TREE in terms of 
elliptic integrals. 


6. Using the notation introduced below, find primitives of the following nonele- 
mentary special functions, up to a linear function Az + B: 


a) Ei (x) = J = dz (the exponential integral); 


b) Si (x) = J — dz (the sine integral); 


c) Ci (x) = / — dx (the cosine integral); 


sinh x 
£ 


dz (the hyperbolic sine integral); 


MOA / 
e) Chi (x) = / come dz (the hyperbolic cosine integral); 


f) S(x) = [ sinc? dx 
g) C(x) = f oosa dx 


(the Fresnel integrals); 


h) (x) = / e`? dex (the Euler—Poisson integral); 
er dx iinet os 
i) li (x) = J PE (the logarithmic integral). 
7. Verify that the following equalities hold, up to a constant: 
a) Ei (x) = li (x); 
b) Chi (x) = 3 Ei (x) + Ei (-2)|; 
c) Shi (£) = 3 [Ei (2) — Ei (-2)); 


d) Ei (ix) = Ci (x) + iSi (x); 
e) e'”/4G(re7i7/4) = C(x) + iS(x). 
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8. A differential equation of the form 


dy _ f(z) 
dx g(y) 


is called an equation with variables separable, since it can be rewritten in the form 
g(y) dy = f(x) da, 


in which the variables x and y are separated. Once this is done, the equation can 
be solved as 


[owev=f fadet c, 


by computing the corresponding integrals. 
Solve the following equations: 


a) 2z°yy' + y? = 2; 

b) zyy’ = V1 + 2°; 

c) y’ = cos(y + 2), setting u(x) = y(x) + 2; 

d) xy’ — cos 2y = 1, and exhibit the solution satisfying the condition y(x) > 0 
as £t > +00. 

e) 5y'(z) = Si (x); 

“(z) _ 

f) 2 = C(x). 
9. A parachutist has jumped from an altitude of 1.5 km and opened the parachute 
at an altitude of 0.5 km. For how long a time did he fall before opening the 
parachute? Assume the limiting velocity of fall for a human being in air of normal 
density is 50 m/s. Solve this problem assuming that the air resistance is proportional 
to: 


a) the velocity; 


b) the square of the velocity. 
Neglect the variation of pressure with altitude. 


10. It is known that the velocity of outflow of water from a small aperture at the 
bottom of a vessel can be computed quite precisely from the formula 0.6./2gH, 
where g is the acceleration of gravity and H the height of the surface of the water 
above the aperture. 

A cylindrical vat is set upright and has an opening in its bottom. Half of the 
water from the full vat flows out in 5 minutes. How long will it take for all the water 
to flow out? 


11. What shape should a vessel be, given that it is to be a solid of revolution, in 
order for the surface of the water flowing out of the bottom to fall at a constant 
rate as water flows out its bottom? (For the initial data, see Exercise 10). 


12. In a workshop with a capacity of 10* m? fans deliver 10° mî of fresh air per 
minute, containing 0.04% CO2, and the same amount of air is vented to the outside. — 
At 9:00 AM the workers arrive and after half an hour, the content of COz2 in the 
air rises to 0.12%. Evaluate the carbon dioxide content of the air by 2:00 PM. 


6 Integration 


6.1 Definition of the Integral and Description 
of the Set of Integrable Functions 


6.1.1 The Problem and Introductory Considerations 


Suppose a point is moving along the real line, with s(t) being its coordinate 
at time t and v(t) = s'(t) its velocity at the same instant t. Assume that we 
know the position s(tg) of the point at time to and that we receive information 
on its velocity. Having this information, we wish to compute s(t) for any given 
value of time t > fo. 

If we assume that the velocity v(t) varies continuously, the displacement 
of the point over a small time interval can be computed approximately as the 
product v(T)At of the velocity at an arbitrary instant 7 belonging to that 
time interval and the magnitude At of the time interval itself. Taking this 
observation into account, we partition the interval [to,t] by marking some 
times t; (i = 0,..., n) so that to < tı <---<t, = t and so that the intervals 
[t;-1,t;] are small. Let At; = t; — t;_1 and 7; € [t;-1,t,;|. Then we have the 
approximate equality 


n 
s(t) = s(to) ~ X v(t) At; ‘ 
i=1 
According to our picture of the situation, this approximate equality will 
become more precise if we partition the closed interval |to, t] into smaller and 
smaller intervals. Thus we must conclude that in the limit as the length A of 
the largest of these intervals tends to zero we shall obtain an exact equality 


lim 2 an = s(t) — s(to) . (6.1) 


This equality is none other than the Newton-Leibniz formula (fundamen- 
tal theorem of calculus), which is fundamental in all of analysis. It enables us 
on the one hand to find a primitive s(t) numerically from its derivative v(t), 

n 
and on the other hand to find the limit of sums )> v(r;)At; on the left-hand 
i=1 
side from a primitive s(t), found by any means whatever. 
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Such sums, called Riemann sums, are encountered in a wide variety of 
situations. 

Let us attempt, for example, following Archimedes, to find the area under 
the parabola y = x? above the closed interval [0,1] (see Fig. 6.1). Without 
going into detail here as to the meaning of the area of a figure, which we shall 
take up later, like Archimedes, we shall work by the method of exhausting the 
figure with simple figures — rectangles, whose areas we know how to compute. 
After partitioning the closed interval [0,1] by points 0 = zo < zı < +- < 
Ln = 1 into tiny closed intervals [7;_1,2;], we can obviously compute the 
required area o as the sum of the areas of the rectangles shown in the figure: 


n ; 
ox ) x?_, Az; ; 
i=1 


here Ax; = z; — 24-1. Setting f(x) = x? and &; = 2x;_1, we rewrite the 
formula as a | 
eae So f(&) Ari 
=i 


In this notation we have, in the limit, 
lim 2 fE)Ar =o, (6.2) 


where, as above, A is the length of the longest interval [x;_1,x;| in the par- 
tition. 

Formula (6.2) differs from (6.1) only in the notation. Forgetting for a 
moment the geometric meaning of f(€;) Ax; and regarding x as time and 
f(x) as velocity, we find a primitive F(x) for the function f(x) and then, by 
formula (6.1) we find that o = F(1) — F(0). 

In our case f(x) = x?, so that F(x) = įx? +c, and o = F(1)— F(0) = 3. 
This is Archimedes’ result, which he obtained by a direct computation of the 
limit in (6.2). 
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A limit of integral sums is called an integral. Thus the Newton—Leibniz 
formula (6.1) connects the integral with the primitive. 

We now turn to precise formulations and the verification of what was 
obtained above on the heuristic level from general considerations. 


6.1.2 Definition of the Riemann Integral 


a. Partitions 


Definition 1. A partition P of a closed interval [a, b], a < b, is a finite system 


of points Zo,...,2%n of the interval such that a = £o < 41 < +++ < En =b. 
The intervals [x;_1, xil], (i = 1,...,n) are called the intervals of the par- 
tition P. 


The largest of the lengths of the intervals of the partition P, denoted 
A(P), is called the mesh of the partition. 


Definition 2. We speak of a partition with distinguished points (P,€) on 
the closed interval [a,b] if we have a partition P of [a,b] and a point €; € 
[x;-1,2;] has been chosen in each of the intervals of the partition [x;_1, £:] 
(= esac): 


We denote the set of points (€1,...,&,) by the single letter €. 


b. A Base in the Set of Partitions In the set P of partitions with 
distinguished points on a given interval [a, b], we consider the following base 
B = {Ba}. The element Bg, d > 0, of the base B consists of all partitions 
with distinguished points (P,€) on [a,b] for which A(P) < d. 

Let us verify that {By}, d > 0 is actually a base in P. 

First Bg Æ Ø. In fact, for any number d > 0, it is obvious that there exists 
a partition P of [a,b] with mesh A(P) < d (for example, a partition into n 
congruent closed intervals). But then there also exists a partition (P, €) with 
distinguished points for which A(P) < d. 

Second, if dı > 0, d2 > 0, and d = min{dj, d2}, it is obvious that By, N 
Ba, = B4 E€ B. 

Hence B = {Ba} is indeed a base in P. 


c. Riemann Sums 


Definition 3. Ifa function f is defined on the closed interval |a, b] and (P, €) 
is a partition with distinguished points on this closed interval, the sum 


o(f; P, €) := De FE) Az: , (6.3) 


where Ax; = z; — £i—1, is the Riemann sum of the function f corresponding 
to the partition (P,£) with distinguished points on [a,b]. 
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Thus, when the function f is fixed, the Riemann sum o(f; P, €) is a func- 
tion &(p) = o(f;p) on the set P of all partitions p = (P, €) with distinguished 
points on the closed interval [a,b]. 

Since there is a base B in P, one can ask about the limit of the function 
(p) over that base. 


d. The Riemann Integral Let f be a function defined on a closed interval 


[a,b]. 


Definition 4. The number I is the Riemann integral of the function f on 
the closed interval [a,b] if for every £ > 0 there exists ô > 0 such that 


= ` f€) Ax; <—€ 
i=1 
for any partition (P,€) with distinguished points on [a,b] whose mesh X(P) 


is less than 0. 


Since the partitions p = (P,€) for which \(P) < 6 form the element Bẹ 
of the base B introduced above in the set P of partitions with distinguished 
points, Definition 4 is equivalent to the statement 


I = lim &(p) , 

im S(p) 
that is, the integral J is the limit over B of the Riemann sums of the function 
f corresponding to partitions with distinguished points on [a,b]. 


It is natural to denote the base B by \(P) — 0, and then the definition 
of the integral can be rewritten as 


T= lim X f(&)Aa. (6.4) 
i=1 


The integral of f(x) over [a,b] is denoted 


B 
J foar, 


in which the numbers a and b are called respectively the lower and upper 
limits of integration. The function f is called the integrand, f(x) dx is called 
the differential form, and x is the variable of integration. Thus 


(6.5) 
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Definition 5. A function f is Riemann integrable on the closed interval [a, b] 
if the limit of the Riemann sums in (6.5) exists as A(P) — 0 (that is, the 
Riemann integral of f is defined). 


The set of Riemann-integrable functions on a closed interval [a,b] will be 
denoted Rfa, b]. 

Since we shall not be considering any integrals except the Riemann inte- 
gral for a while, we shall agree for the sake of brevity to say simply “inte- 
gral” and “integrable function” instead of “Riemann integral” and “Riemann- 
integrable function”. 


6.1.3 The Set of Integrable Functions 


By the definition of the integral (Definition 4) and its reformulation in the 
forms (6.4) and (6.5), an integral is the limit of a certain special function 
(p) = o(f; P,€), the Riemann sum, defined on the set P of partitions p = 
(P, €) with distinguished points on [a,b]. This limit is taken with respect to 
the base B in P that we have denoted A(P) —> 0. 

Thus the integrability or nonintegrability of a function f on [a,b] depends 
on the existence of this limit. 

By the Cauchy criterion, this limit exists if and only if for every € > 0 
there exists an element Bs € B in the base such that 


|P(p') — B(p")| < € 


for any two points p’, p” in Bs. 
In more detailed notation, what has just been said means that for any 
E€ > 0 there exists 6 > 0 such that 


lofi PE) — olf; PE) <e 


or, what is the same, 
NO SE) Ar,- X KE) Ar| < e (6.6) 
i=1 1=1 


for any partitions (P’,€’) and (P”,€”) with distinguished points on the in- 
terval [a,b] with A\(P’) < 6 and A(P”) < ô. 

We shall use the Cauchy criterion just stated to find first a simple neces- 
sary condition, then a sufficient condition for Riemann integrability. 


a. A Necessary Condition for Integrability 


Proposition 1. A necessary condition for a function f defined on a closed 
interval [a,b] to be Riemann integrable on a,b] is that f be bounded on [a,b]. 


334 6 Integration 


In short, 
(f € Ra, b]) = (f is bounded on [a, }}) . 


Proof. If f is not bounded on [a, 6], then for any partition P of [a,b] the 

function f is unbounded on at least one of the intervals [x;_1, 2;] of P. This 

means that, by choosing the point €; € [x;-1,2z,;] in different ways, we can 

make the quantity |f(é;)Az,;| as large as desired. But then the Riemann sum 
n 


o(f;P,€) = >> f(&)Aax; can also be made as large as desired in absolute 
i=] 


value by changing only the point €; in this interval. 

It is clear that there can be no possibility of a finite limit for the Riemann 
sums in such a case. That was in any case clear from the Cauchy criterion, 
since relation (6.6) cannot hold in that case, even for arbitrarily fine parti- 
tions. O 


As we shall see, the necessary condition just obtained is far from being 
both necessary and sufficient for integrability. However, it does enable us to 
restrict consideration to bounded functions. 


b. A Sufficient Condition for Integrability and the Most Impor- 
tant Classes of Integrable Functions We begin with some notation and 
remarks that will be used in the explanation to follow. 

We agree that when a partition P 


a = £o < T1 <: < Tn =b 


is given on the interval |a, b], we shall use the symbol A; to denote the interval 
[z;-1,2;| along with Az; as a notation for the difference x; — x;_1. 

If a partition P of the closed interval [a, b] is obtained from the partition 
P by the adjunction of new points to P, we call Pa refinement of P. 

When a refinement P of a partition P is constructed, some (perhaps all) 
of the closed intervals A; = [x;-1,x;| of the partition P themselves undergo 
partitioning: 73-1 = Zig < °°: < Tin; = Ti. In that connection, it will be 
useful for us to label the points of P by double indices. In the notation x;; the 
first index means that x;; € 4;, and the second index is the ordinal number of 
the point on the closed interval A;. It is now natural to set Azi; := Zij —£ij-1 
and A; = Peay es |: Thus Az; = Azi +--+ ANE 

As an example of a partition that is a refinement of both the partition P’ 
and P” one can take P = P’U P”, obtained as the union of the points of the 
two partitions P’ and P”. 

We recall finally that, as before, w( f; E) denotes the oscillation of the 
function f on the set E, that is 


w(f;E):= sup |f(z')— f(z”). 
T’, ECE 


In particular, w( f; A;) is the oscillation of f on the closed interval A;. This 
oscillation is necessarily finite if f is a bounded function. 
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We now state and prove a sufficient condition for integrability. 


Proposition 2. A sufficient condition for a bounded function f to be inte- 
grable on a closed interval [a,b] is that for every £ > 0 there exist a number 
ô > 0 such that 


X w(f; Ai)Ax; < € 
i=1 
for any partition P of [a,b] with mesh X(P) < 6. 
Proof. Let P be a partition of [a,b] and P a refinement of P. Let us estimate 


the difference between the Riemann sums o(f; P, ) — o(f;P,€). Using the 
notation introduced above, we can write | 


lo(f;P,é) — olf; P,8)| = {Do e O )Az;| = 


mie 
=|S2¥° Es)4ey — 0 424 = 
mer 202, 
z Er (f(s) — FE) Aais| < D Hes) — E. 
a i=1 j= 
s ISo; A)Azy = Yuta Ag. 
a jan 


In this computation we have used the relation Az; = 3 Azx;; and the in- 


equality |f (€:;) — f(&)| < w( f; Ai), which holds Baen Ei € Aij C A; and 
E; E€ A; 

It follows from the estimate for the difference of the Riemann sums that 
if the function satisfies the sufficient condition given in the statement of 
Proposition 2, then for any € > 0 we can find 6 > 0 such that 


lo(f; P,E) — alf; P,£)| < s 


for any partition P of [a,b] with mesh \(P) < ô, any refinement P of P, and 
any choice of the sets of distinguished points € and €. 

Now if (P’,é’) and (P”,€”) are arbitrary partitions with distinguished 
points on |a, b] whose meshes satisfy A(P’) < 6 and A(P”) < 6, then, by what 
has just been proved, the partition P = P' U P”, which is a refinement of 
both of them, must satisfy 


lo(f;P,€) —o(f;P’,€’)| < 
lo(f; P,&) — o(f;P",£")| < 


bol Mm bol] 
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It follows that 

lo(f;P’,€') -olf P”, E) <e, 
provided A(P') < 6 and A(P”) < 6. Therefore, by the Cauchy criterion, the 
limit of the Riemann sums exists: 


DO )Az; , 


Eo 0 


that is f € Ria, b]. O 


Corollary 1. (f € C[a,b]) = (f € R[a,b]), that is, every continuous func- 
tion on a closed interval is integrable on that closed interval. 


Proof. If a function is continuous on a closed interval, it is bounded there, 
so that the necessary condition for integrability is satisfied in this case. But 
a continuous function on a closed interval is uniformly continuous on that 
interval. Therefore, for every € > 0 there exists ô > 0 such that w( f; A) < ;= 
on any closed interval A C [a, b] of length less than 6. Then for any partition 
P with mesh A(P) < 6 we have 


E 2 E 
Susi i)An; < z2 Ari ade (de eae 
i=1 


By Proposition 2, we can now conclude that f € Ra, b]. O 


Corollary 2. If a bounded function f on a closed interval |a, b] is continuous 
everywhere except at a finite set of points, then f € Ria, b]. 


Proof. Let w(f;[a,b]) < C < co, and suppose f has k points of discontinuity 
on |a,b]. We shall verify that the sufficient condition for integrability of the 
function f is satisfied. 

For a given € > 0 we choose the number 6; = 34% and construct the 
6,-neighborhood of each of the k points of discontinuity of f on [a,b]. The 
complement of the union of these neighborhoods in [a,b] consists of a finite 
number of closed intervals, on each of which f is continuous and hence uni- 
formly continuous. Since the number of these intervals is finite, given € > 0 
there exists d2 > 0 such that on each interval A whose length is less than 
ôo and which is entirely contained in one of the closed intervals just men- 
tioned, on which f is continuous, we have w(f; A) < IOa: We now choose 
ô = min{ô1, ô2}. 

Let P be an arbitrary partition of [a,b] for which A(P) < 6. We break the 


n 
sum ` w(f; A;) Ax; corresponding to the partition P into two parts: 
i=l 


DUE Ar = N w(f; A i)Azi + S "w(f;A Jati ; 
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The sum 5°’ contains the terms corresponding to intervals A; of the partition 
having no points in common with any of the 6,-neighborhoods of the points 
of discontinuity. For these intervals A; we have w( f; A;) < 5a)» and so 


E 


E j E 
X Vo, f; Ai) Ax; < a 24 AMS gata 


The sum of the lengths of the remaining intervals of the partition P, as 
one can easily see, is at most (6 + 26, + ô)k < 4345 -k = gq, and therefore 


E 
X "w(f; A:)Az: < CY "An; < C- == =>. 


Thus we find that for A(P) < ô, 


SS w(f; A;) Az; < €; 


i=l 
that is, the sufficient condition for integrability holds, and so f € R[a,b]. O 


Corollary 3. A monotonic function on a closed interval is integrable on that 
interval. 


Proof. It follows from the monotonicity of f on [a,b] that w(f;[a,b]) = 
|f(b) — f(a)|. Suppose € > 0 is given. We set ô = Fe) =F} We assume 
that f(b) — f(a) Æ 0, since otherwise f is constant, and there is no doubt 
as to its integrability. Let P be an arbitrary partition of [a,b] with mesh 
A(P) < ô. 

Then, taking account of the monotonicity of f, we have 


Duis Jans ath Ai) =8 9 IS) - fei) = 


D OR fe) = 6|f(b) — f(a)| = 


i=1 


=ô 


Thus f satisfies the sufficient condition for integrability, and therefore 
fe Ria, b]. O 


A monotonic function may have a (countably) infinite set of discontinu- 
ities on a closed interval. For example, the function defined by the relations 


l- zh forl-se;,<a<l-s, nen, 


f(x) = 
1 for x = 1 


on (0, 1] is nondecreasing and has a discontinuity at every point of the form 


1 — a, NEN. 
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Remark. We note that, although we are dealing at the moment with real- 
valued functions on an interval, we have made no use of the assumption 
that the functions are real-valued rather than complex-valued or even vector- 
valued functions of a point of the closed interval [a,b], either in the definition 
of the integral or in the propositions proved above, except Corollary 3. 

On the other hand, the concept of upper and lower Riemann sums, to 
which we now turn, is specific to real-valued functions. 


Definition 6. Let f : [a,b] — R be a real-valued function that is defined 
and bounded on the closed interval [a,b], let P be a partition of [a,b], and 
let A; (i = 1,...,n) be the intervals of the partition P. Let m; = inf f(x) 
and M; = sup f(x) (¢=1,...,n). 
rea; 
The sums n 
s(f; P) := ` Mi Ax; 
i=1 
and n 
S(T: P) = Ņ\ O Mi4r; 
i=1 
are called respectively the lower and upper Riemann sums of the function f 
on the interval [a,b] corresponding to the partition P of that interval.’ The 


sums s(f;P) and S(f; P) are also called the lower and upper Darbour sums 
corresponding to the partition P of [a,b]. 


If (P,€) is an arbitrary partition with distinguished points on |a, b], then 
obviously 


s(f; P) < o(f; P,§) < S(f; P) . (6.7) 


Lemma 1. 
s(f; P) = inf o( fi Pe), 
S(f;P) = Pu P£) 


Proof. Let us verify, for example, that the upper Darboux sum correspond- 
ing to a partition P of the closed interval [a,b] is the least upper bound of 
the Riemann sums corresponding to the partitions with distinguished points 
(P, €), the supremum being taken over all sets € = (€1,...,&,) of distinguished 
points. 

In view of (6.7), it suffices to prove that for any € > 0 there is a set € of 
distinguished points such that 


S(f;P) <o(f;Bé) +e. (6.8) 


1 The term “Riemann sum” here is not quite accurate, since m; and M; are not 
always values of the function f at some point €; € A. 
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_ By definition of the numbers M,, for each i € Oe .,n} there is a point 
éi € A; at which M; < f(&) + gêz. Let € = (&1,... E). Then 


i=1 i=1 i=1 


which completes the proof of the second assertion of the lemma. The first 
assertion is verified similarly. O 


From this lemma and inequality (6.7), taking account of the definition of 
the Riemann integral, we deduce the following proposition. 


Proposition 3. A bounded real-valued function f : [a,b] + R is Riemann- 
integrable on [a,b] if and only if the following limits exist and are equal to 
each other: 


A = ii SPs 
I= lim (fP), T= lim S(f;P) (6.9) 


When this happens, the common value I = I = I is the integral 


fto dz . 


Proof. Indeed, if the limits (6.9) exist and are equal, we conclude by the 
properties of limits and by (6.7) that the Riemann sums have a limit and 
that 

I= ii Pé\=T. 

ee o(f; P,£) = 


On the other hand, if f € Ra, b], that is, the limit 


li Pé=!1 
Asan sok £) 


exists, we conclude from (6.7) and (6.8) that the limit dm a? (f;P)=I 
—_ 


exists and J = J. 
Similarly one can verify that lim s(f;P)=Z=TJ. O 
A(P)—-0 


As a corollary of Proposition 3, we obtain the following sharpening of 
Proposition 2. 


Proposition 2’. A necessary and sufficient condition for a function f : 
[a,b] + R defined on a closed interval |a, b| to be Riemann integrable on |a, b] 
is the following relation: 


D f;a;)Ax; =0. (6.10) 


Py 
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Proof. Taking account of Proposition 2, we have only to verify that condition 
(6.10) is necessary for f to be integrable. 
We remark that w(f; A;) = Mi — m;, and therefore 


X w(f; Ai) An; = 5 (M; — mi) Ax; = S(f; P) — s(f; P) , 
i=] 


i=1 

and (6.10) now follows from Proposition 3 if f € R[a,b]. O 
c. The Vector Space R[a,b] Many operations can be performed on in- 
tegrable functions without going outside the class of integrable functions 
Ria, b]. 
Proposition 4. If f,g € Rla, b], then 

a) (f + 9) € Ria, b]; 

b) (af) € Ra, b], where a is a numerical coefficient; 

c) |f| € Ria, b]; 

d) flieg € Rie, d] if [c,d] C [a, t]; 


e) (f - g) E€ Ria, b]. 


We are considering only real-valued functions at the moment, but it is 
worthwhile to note that properties a), b), c), and d) turn out to be valid for 
complex-valued and vector-valued functions. For vector-valued functions, in 
general, the product f -g is not defined, so that property e) is not consid- 
ered for them. However, this property continues to hold for complex-valued 
functions. 

We now turn to the proof of Proposition 4. 


Proof. a) This assertion is obvious since 


bu + 9)(&) Axi = > f(E) Aa + 3 9 (Ei) Az; . 


b) This assertion is obvious, since 
Saf) (é) Ae, = ay FE) Ag; . 
i=l i=1 
c) Since w(|f|; E) < w(f; E), we can write 
Y oish A;) Az; < Sufi A Ax, , 
i=1 


and conclude by Proposition 2 that (f € Fla, b]) > (|f| € RIa,b]). 
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d) We want to verify that the restriction f | led] to [c, d] of a function f that 
is integrable on the closed interval |a, b] is integrable on [c,d] if [c,d] C [a,b]. 
Let 7 be a partition of |c, d]. By adding points to 7, we extend it to a partition 
P of the closed interval [a,b] so as to have A(P) < A(z). It is clear that one 
can always do this. 

We can then write 


> MF legs Ai) Ai < $ pol; Ai) Aas , 


where $`, is the sum over all the intervals of the partition 7 and $p the 
sum over all the intervals of P. 

By construction, as \(7) — 0 we have (P) — 0 also, and so by Proposi- 
tion 2’ we conclude from this inequality that (f € Ra, b]) > (f € Ric, d]) if 
[c,d] C [a,b]. 

e) We first verify that if f € Ria, b], then f? € Ra, b]. 

If f € Ria, bj, then f is bounded on [a,b]. Let |f(x)| < C < œ on [a,b]. 
Then 


|° (21) — f° (22)| = |(F (21) + f(22)) - (f(£1) — F(@2)) | < 2C|f (21) — f(x2)I , 
and therefore w(f?; E) < 2Cw(f; E) if E c [a,b]. Hence 


Dasa i)z; < 20 u(f; A JAT 


i=1 


from which we conclude by Proposition 2’ that 
(f € Ria, bl) => (f? € Ria, d}) . 


We now turn to the general case. We write the identity 


(Fo) a) = FIE + 9)?(e) — (F - 9)*(@)] 


From this identity and the result just proved, together with a) and b), which 
have already been proved, we conclude that 


(f € Ria, b]) A (g € Ra, b]) => (f -g E€ Ria, dj) al 


You already know what a vector space is from your study of algebra. The 
real-valued functions defined on a set can be added and multiplied by a real 
number, both operations being performed pointwise, and the result is another 
real-valued function on the same set. If functions are regarded as vectors, one 
can verify that all the axioms of a vector space over the field of real numbers 
hold, and the set of real-valued functions is a vector space with respect to 
the operations of pointwise addition and multiplication by real numbers. » 

In parts a) and b) of Proposition 4 it was asserted that addition of in- 
tegrable functions and multiplication of an integrable function by a number 
do not lead outside the class Ra, b] of integrable functions. Thus Ra, b] is 
itself a vector space — a subspace of the vector space of real-valued functions 
defined on the closed interval [a,b]. 
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d. Lebesgue’s Criterion for Riemann Integrability of a Function 
In conclusion we present, without proof for the time being, a theorem of 
Lebesgue giving an intrinsic description of a Riemann-integrable function. 

To do this, we introduce the following concept, which is useful in its own 
right. 


Definition 7. A set E C R has measure zero or is of measure zero (in the 
sense of Lebesgue) if for every number € > 0 there exists a covering of the 
set Æ by an at most countable system {J;,} of intervals, the sum of whose 


OQO 
lengths >` |J;,| is at most e. 
k=1 


OO 
Since the series 5° |I| converges absolutely, the order of summation of 
k=1 
the lengths of the intervals of the covering does not affect the sum (see Propo- 
sition 4 of Subsect. 5.5.2), so that this definition is unambiguous. 


Lemma 2. a) A single point and a finite number of points are sets of measure 
zero. 


b) The union of a finite or countable number of sets of measure zero is a 
set of measure zero. 


c) A subset of a set of measure zero is itself of measure zero. 
d) A closed interval [a,b] with a < b is not a set of measure zero. 


Proof. a) A point can be covered by one interval of length less than any 
preassigned number € > 0; therefore a point is a set of measure zero. The 
rest of a) then follows from b). 


b) Let E = U E” be an at most countable union of of sets E” of measure 
n 


zero. Given € > 0, for each E” we construct a covering {J7'} of E” such that 
2 el < gn- 
Since the union of an at most countable collection of at most countably 


many sets is itself at most countable, the intervals J’, k,n € N, form an at 
most countable covering of the set E, and 


ai are Je E€ _ 
fatal ge get Ome: 


The order of summation ` |J?| on the indices n and k is of no importance, 
g n, 
since the series converges to the same sum for any order of summation if it 
converges in even one ordering. Such is the case here, since any partial sums 
of the series are bounded above by €. 
Thus F is a set of measure zero in the sense of Lebesgue. 


c) This statement obviously follows immediately from the definition of a 
set of measure zero and the definition of a covering. 
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d) Since every covering of a closed interval by open intervals contains a 
finite covering, the sum of the lengths of which obviously does not exceed the 
sum of the lengths of the intervals in the original covering, it suffices to verify 
that the sum of the lengths of open intervals forming a finite covering of a 
closed interval [a,b] is not less than the length b — a of that closed interval. 

We shall carry out an induction on the number of intervals in the covering. 

For n = 1, that is, when the closed interval [a,b] is contained in one open 
interval (a, 8), it is obvious that a <a <b < Gand B-—~a>b-—a. 

Suppose the statement is proved up to index k € N inclusive. Consider 
a covering consisting of k + 1 open intervals. We take an interval (qj, a2) 
containing the point a. If ag > b, then ag — a, > b — a, and the result is 
proved. If a < ag < b, the closed interval [a,b] is covered by a system of 
at most k intervals, the sum of whose lengths, by the induction assumption, 
must be at least b — ao. But 


b— a= (b— az) +a2—a< (b— az) + (a2— a), 


and so the sum of the lengths of all the intervals of the original covering of 
the closed interval [a,b] was greater than its length b— a. O 


It is interesting to note that by a) and b) of Lemma 2 the set Q of rational 
points on the real line R is a set of measure zero, which seems rather surprising 
at first sight, upon comparison with part d) of the same lemma. 


Definition 8. If a property holds at all points of a set X except possibly 
the points of a set of measure zero, we say that this property holds almost 
everywhere on X or at almost every point of X. 


We now state Lebesgue’s criterion for integrability. 


Theorem. A function defined on a closed interval 1s Riemann integrable on 
that interval if and only if it is bounded and continuous at almost every point. 


Thus, 


(f € R{a, b]) & (f is bounded on [a, b]) A 


A (f is continuous almost everywhere on [a, })) . 


It is obvious that one can easily derive Corollaries 1, 2, and 3 and Propo- 
sition 4 from the Lebesgue criterion and the properties of sets of measure 
zero proved in Lemma 2. 

We shall not prove the Lebesgue criterion here, since we do not need it 
to work with the rather regular functions we shall be dealing with for the 
present. However, the essential ideas involved in the Lebesgue criterion can 
be explained immediately. 


344 6 Integration 


Proposition 2’ contained a criterion for integrability expressed by relation 
n 

(6.10). The sum $` w(f; A;)Azx; can be small on the one hand because of 
i=1 


the factors w(f;4;), which are small in small neighborhoods of points of 
continuity of the function. But if some of the closed intervals A; contain 
points of discontinuity of the function,then w(f;A;) does not tend to zero 
for these points, no matter how fine we make the partition P of the closed 
interval [a,b]. However, w(f; A;) < w(f;[a,b]) < co since f is bounded on 
[a,b]. Hence the sum of the terms containing points of discontinuity will also 
be small if the sum of the lengths of the intervals of the partition that cover 
the set of points of discontinuity is small; more precisely, if the increase in the 
oscillation of the function on some intervals of the partition is compensated 
for by the smallness of the total lengths of these intervals. 

A precise realization and formulation of these observations amounts to 
the Lebesgue criterion. | 

We now give two classical examples to clarify the property of Riemann 
integrability for a function. 


Example 1. The Dirichlet function 


1 for rx EQ, 
D(x) = 
0 forr ER\Q, 


on the interval [0,1] is not integrable on that interval, since for any partition 
P of [0,1] one can find in each interval A; of the partition both a rational 
point £; and an irrational point €;’. Then 


o(f;P,€) =) Ane 


i=1 
while : 
Of PESSE So. Ar; =0. 
i=1 
Thus the Riemann sums of the function D(x) cannot have a limit as 
A(P) > 0. 
From the point of view of the Lebesgue criterion the nonintegrability of 


the Dirichlet function is also obvious, since D(x) is discontinuous at every 
point of [0,1], which, as was shown in Lemma 2, is not a set of measure zero. 


Example 2. Consider the Riemann function 


Ł , if x € Q and z = “ is in lowest terms , 


R(x) = 
0, ifreR\Q. 
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We have already studied this function in Subsect. 4.1.2, and we know that 
R(x) is continuous at all irrational points and discontinuous at all rational 
points except 0. Thus the set of points of discontinuity of R(x) is countable 
and hence has measure zero. By the Lebesgue criterion, R(x) is Riemann 
integrable on any interval [a,b] C R, despite there being a discontinuity of 
this function in every interval of every partition of the interval of integration. 


Example 8. Now let us consider a less classical problem and example. 

Let f : [a,b] — R be a function that is integrable on [a,b], assuming values 
in the interval [c,d] on which a continuous function g : [c,d] —> R is defined. 
Then the composition g o f : [a,b] —> R is obviously defined and continuous 
at all the points of [a,b] where f is continuous. By the Lebesgue criterion, it 
follows that (g o f) € Ria, b]. 

We shall now show that the composition of two arbitrary integrable func- 
tions is not always integrable. 

Consider the function g(x) = |sgn|(x). This function equals 1 for x 4 0 
and 0 for x = 0. By inspection, we can verify that if we take, say the Riemann 
function f on the closed interval [1,2], then the composition (g o f)(x) on 
that interval is precisely the Dirichlet function D(x). Thus the presence of 
even one discontinuity of the function g(x) has led to nonintegrability of the 
composition go f. 


6.1.4 Problems and Exercises 


1. The theorem of Darbouz. a) Let s(f; P) and S(f;P) be the lower and upper 
Darboux sums of a real-valued function f defined and bounded on the closed interval 
[a,b] and corresponding to a partition P of that interval. Show that 


s(f; Pi) < S(f; P2) 
for any two partitions P, and P» of [a,b]. 

b) Suppose the partition P is a refinement of the partition P of the interval 
[a,b], and let A;,,..., Ai, be the intervals of the partition P that contain points of 
the partition P that do not belong to P. Show that the following estimates hold: 

0 < S(f; P) — S(f; P) < w(f;[a,b]) (Avi, +-+ + Ari), 
0 < s(f; P) — s(f; P) < w(f; la, b]) - (Aza +- + Azi,). 
c) The quantities I = sup s( f; P) and J = inf S(f; P) are called respectively the 
P 
lower Darboux integral and the upper Darboux integral of f on the closed interval 
[a, b]. Show that I < I. 
d) Prove the theorem of Darboux: 


I= AU I= aaa a 


e) Show that (f € Ria, b]) = (I =T). 
f) Show that f € R[a,b] if and only if for every £ > 0 there exists a partition P 
of [a,b] such that S(f; P)— s(f; P) <e. 
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2. The Cantor set of Lebesgue measure zero. a) The Cantor set described in Prob- 
lem 7 of Sect. 2.4 is uncountable. Verify that it nevertheless is a set of measure 0 
in the sense of Lebesgue. Show how to modify the construction of the Cantor set 
in order to obtain an analogous set “full of holes” that is not a set of measure zero. 
(Such a set is also called a Cantor set.) 

b) Show that the function on [0,1] defined to be zero outside a Cantor set and 
1 on the Cantor set is Riemann integrable if and only if the Cantor set has measure 
Zero. 

c) Construct a nondecreasing continuous and nonconstant function on [0, 1] that 
has a derivative equal to zero everywhere except at the points of a Cantor set of 
measure zero. 


3. The Lebesgue criterion. a) Verify directly (without using the Lebesgue criterion) 
that the Riemann function of Example 2 is integrable. 

b) Show that a bounded function f belongs to Ra, b] if and only if for any two 
numbers € > 0 and 6 > 0 there is a partition P of [a,b] such that the sum of the 
lengths of the intervals of the partition on which the oscillation of the function is 
larger than € is at most 6. | 

c) Show that f € Ra, b] if and only if f is bounded on [a,b] and for any € > 0 
and ô > 0 the set of points in [a,b] at which f has oscillation larger than € can be 
covered by a finite set of open intervals the sum of whose lengths is less than 6 (the 
du Bois-Reymond criterion).” 

d) Using the preceding problem, prove the Lebesgue criterion for Riemann in- 
tegrability of a function. 


4. Show that if f,g E€ R[a, b] and f and g are real-valued, then max{ f, g} € R[a, b] 
and min{ f,g} E R[a, b]. 


5. Show that 
b 
a) if f,g € R[a, b] and f(x) = g(x) almost everywhere on [a,b], then f f(x) dz = 


b 
J g(x) da; 


b) if f € R[a, b] and f(x) = g(x) almost everywhere on [a,b], then g can fail to 
be Riemann-integrable on [a,b], even if g is defined and bounded on [a, b]. 


6. Integration of vector-valued functions. a) Let r(t) be the radius-vector of a point 
moving in space, ro = r(0) the initial position of the point, and v(t) the velocity 
vector as a function of time. Show how to recover r(t) from ro and v(t). 

b) Does the integration of vector-valued functions reduce to integrating real- 
valued functions? 

c) Is the criterion for integrability stated in Proposition 2’ valid for vector-valued 
functions? 

d) Is Lebesgue’s criterion for integrability valid for vector-valued functions? 

e) Which concepts and facts from this section extend to functions with complex 
values? 


2 P, du Bois-Reymond (1831-1889) — German mathematician. 
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6.2 Linearity, Additivity and Monotonicity 
of the Integral 


6.2.1 The Integral as a Linear Function on the Space Ra, b] 


Theorem 1. If f and g are integrable functions on the closed interval [a,b], 
a linear combination of them af + Bg is also integrable on [a,b], and 


b 


[los + Baya) dz = aie d8 f ols dz. (6.11) 


a 


Proof. Consider a Riemann sum for the integral on the left-hand side of 
(6.11), and transform it as follows: 


> (af + bg) (£) Ax; = a Ds f(&)Ax;,+ B D G(&i) Ax; . (6.12) 


Since the right-hand side of this last equality tends to the linear combination 
of integrals that makes up the right-hand side of (6.11) if the mesh A(P) of 
the partition tends to 0, the left-hand side of (6.12) must also have a limit as 
A(P) — 0, and that limit must be the same as the limit on the right. Thus 
(af + Bg) € Ria, b] and Eq. (6.11) holds. O 


If we regard ?[a, b] as a vector space over the field of real numbers and 
b 


the integral f f(x) dz as a real-valued function defined on vectors of R[a, b], 


a 
Theorem 1 asserts that the integral is a linear function on the vector space 
Ria, b]. 
To avoid any possible confusion, functions defined on functions are usually 
called functionals. ‘Thus we have proved that the integral is a linear functional 
on the vector space of integrable functions. 


6.2.2 The Integral as an Additive Function 
of the Interval of Integration 


b 
The value of the integral f f(x)dx = I(f;[a, b]) depends on both the inte- 


grand and the closed interval over which the integral is taken. For example, 
if f € Ria, b], then, as we know, Flia aj € Ria, 6| if [a, 8] C [a,b], that is, the 


B 
integral f f(x) da is defined, and we can study its dependence on the closed 


Q 
interval [a, 8] of integration. 
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Lemma 1. Ifa < b < c and f € Rla,c], then Tiaa E€ Ria, bj, Fle € 
Rb, c], and the following equality’ holds: 


f roaz f foart f foan. (6.13) 
a a b 


Proof. We first note that the integrability of the restrictions of f to the closed 
intervals [a,b] and [b,c] is guaranteed by Proposition 4 of Sect. 6.1. 


C 
Next, since f € R[a, c], in computing the integral f f(z) dz as the limit of 


a 
Riemann sums we may choose any convenient partitions of [a, c]. We shall now 
consider only partitions P of [a,c] that contain the point b. Obviously any 
such partition with distinguished points (P,&) generates partitions (P’, €’) 
and (P”, €’’) of [a,b] and [b,c] respectively, and P = P'U P” and € = E'U E”. 
But then the following equality holds between the corresponding Riemann 
sums: 


o(f;P,) =o0(f;P’,€) +o(f; P”, 6"). 


Since A(P’) < A(P) and A(P”) < A(P), for A(P) sufficiently small, each 
of these Riemann sums is close to the corresponding integral in (6.13), which 
consequently must hold. O 


To widen the application of this result slightly, we temporarily revert once 
again to the definition of the integral. 
We defined the integral as the limit of Riemann sums 


(PO = X f(&)Ar: , (6.14) 


i=1 


corresponding to partitions with distinguished points (P,€) of the closed 
interval of integration [a,b]. A partition P consisted of a finite monotonic 
sequence of points £o, %1,...,£n, the point xo being the lower limit of in- 
tegration a and x, the upper limit of integration b. This construction was 
carried out assuming that a < b. If we now take two arbitrary points a and 
b without requiring a < b and, regarding a as the lower limit of integration 
and b as the upper, carry out this construction, we shall again obtain a sum 
of the form (6.14), in which now Az; > 0 (i =1,...,n) ifa < band Az; < 0 
(i =1,...,n) ifa > b, since Ax; = x; — x;_). Thus for a > b the sum (6.14) 
will differ from the Riemann sum of the corresponding partition of the closed 
interval [b, a] (b < a) only in sign. 


3 We recall that f|e denotes the restriction of the function f to a set E contained 
in the domain of definition of f. Formally we should have written the restriction 
of f to the intervals [a,b] and [b,c], rather than f, on the right-hand side of Eq. 
(6.13). 
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From these considerations we adopt the following convention: if a > b, 
then 


fro d= aie dz. (6.15) 
a b 


In this connection, it is also natural to set 
[t@ dz = 0. (6.16) 


After these conventions, taking account of Lemma 1, we arrive at the 
following important property of the integral. 


Theorem 2. Leta, b,c E€ R and let f be a function integrable over the largest 
closed interval having two of these points as endpoints. Then the restriction 
of f to each of the other closed intervals is also integrable over those intervals 
and the following equality holds: 


f rods f roant f foa =0. (6.17) 
a b c 


Proof. By the symmetry of Eq. (6.17) in a, b, and c, we may assume without 
loss of generality that a = min{a, b, c}. 
If max{a, b,c} = c and a < b < c, then by Lemma 1 


f roa f roa- f roaro, 
a b a 


which, when we take account of the convention (6.15) yields (6.17). 
If max{a, b,c} =b and a < c < b, then by Lemma 1 


f roa f roa- f roo, 


which, when we take account of (6.15), again yields (6.17). 
Finally, if two of the points a, b, and c are equal, then (6.17) follows from 
the conventions (6.15) and (6.16). O 


Definition 1. Suppose that to each ordered pair (a, 3) of points a, 8 € [a,b] 
a number I(a, 3) is assigned so that 


I(a, 7) = I(a, b) + 1(G,7) 


for any triple of points a, 8, y € [a,b]. 
Then the function I(a, 8) is called an additive (oriented) interval function 
defined on intervals contained in [a,b]. 
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b 
If f € RA, B], and a,b,c € [A, B], then, setting I(a,b) = f f(x) daz, we 
conclude from (6.17) that 


Jroa- f roas f roar, (6.18) 
a a b 


that is, the integral is an additive interval function on the interval of integra- 
tion. The orientation of the interval in this case amounts to the fact that we 
order the pair of endpoints of the interval by indicating which is to be first 
(the lower limit of integration) and which is to be second (the upper limit of 
integration). 


6.2.3 Estimation of the Integral, Monotonicity of the Integral, 
and the Mean-value Theorem 


a. A General Estimate of the Integral We begin with a general esti- 
mate of the integral, which, as will become clear later, holds for integrals of 
functions that are not necessarily real-valued. 


Theorem 3. Ifa < b and f € R|a,b], then |f| € R|a,b] and the following 
inequality holds: 


b b 
J TT E / Howes (6.19) 


If |f|\(z) <C on [a,b] then 


b 
J OE (6.20) 


Proof. For a = b the assertion is trivial, and so we shall assume that a < b. 

To prove the theorem it now suffices to recall that |f| € R[a,b] (see 
Proposition 4 of Sect. 6.1), and write the following estimate for the Riemann 
sum o(f; P,&): 


So f(Ei) Aa 
t=1 


< YOI Ae = 7 |f(G)lAai < CY Ax; = Co- 0). 
i=l t=! 


t=1 


Passing to the limit as A(P) > 0, we obtain 


b b 
| fae < | IfI) dz < C(b- a). 5 
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b. Monotonicity of the Integral and the First Mean-value Theorem 
The results that follow are specific to integrals of real-valued functions. 


Theorem 4. If a < b, fi, fe E Rlia,b], and fi(x) < fo(x) at each point 
x € [a,b], then 


fro i< f fs dz. (6.21) 


Proof. For a = b the assertion is trivial. If a < b, it suffices to write the 
following inequality for the Riemann sums: 


> AilG)Ari < So fol&)Aai , 
t=1 t=] 


which is valid since Az; > 0 (i = 1,...,n), and then pass to the limit as 
A(P) 30. o | 


Theorem 4 can be interpreted as asserting that the integral is monotonic 
as a function of the integrand. 
Theorem 4 has a number of useful corollaries. 


Corollary 1. Ifa < b, f E€ Ria,b], and m < f(x) < M at each xz € [a,b], 
then 


b 
m: (b-a) < | f(z)de < M: (b-a), (6.22) 


and, in particular, if 0 < f(x) on [a,b], then 


o< f fejas. 


Proof. Relation (6.22) is obtained by integrating each term in the inequality 
m < f(x) <M and using Theorem 4. O 


Corollary 2. If f € R|a,b], m = n f(x), and M = sup f(x), then 
TE|a, x€ [a,b] 
there exists a number u € [m, M] such that 


b 
fro) dx = u- (b-a). (6.23) 


b 
Proof. If a = b, the assertion is trivial. If a # b, we set u = z+- f f(x) dz. It 


then follows from (6.22) that m < u < M ifa < b. But both sides of (6.23) 
reverse sign if a and b are interchanged, and therefore (6.23) is also valid for 
b<a. O 
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Corollary 3. If f € Cla, }], there is a point € € [a,b] such that 


b 
J fear = F@)0-a). (6.24) 


Proof. By the intermediate-value theorem for a continuous function, there is 
a point € on [a,b] at which f(€) = u if 


m= min f(z) <4 < m max a) M. 
x€[a,b rela 


Therefore (6.24) follows from (6.23). o 


The equality (6.24) is often called the first mean-value theorem for the 
integral. We, however, reserve that name for the following somewhat more 
general proposition. 


Theorem 5. (First mean-value theorem for the integral). Let f,g € Ra, 5], 


m = Le f(x), and M = sup f(x). If g is nonnegative (or nonpositive) 
xela,b xE[a,b] 


on |a, b], then 
b 


b 
fE Dads =n | (2) az, (6.25) 
where p € |m, M]. 

If, in addition, it is known that f € Cla,b], then there exists a point 
E € [a,b] such that 


b 


b 
J (f-9)(a) de = F(E) J g(a) de. (6.26) 


a 


Proof. Since interchanging the limits of integration leads to a simultaneous 
sign reversal on both sides of Eq. (6.25), it suffices to verify this equality for 
the case a < b. Reversing the sign of g(x) also reverses the signs of both sides 
of (6.25), so that we may assume without loss of generality that g(x) > 0 on 
|a, b]. 
Since m = Eea f(x) and M = sup f(x), we have, for g(x) > 0, 
x€[a,b] 


mg(x) < f(z)g9(z) < Mg(z). 


Since m-g E€ Rla, b], f -g E€ Ra, b], and M-g E Ria, b], applying Theorem 4 
and ‘Theorem 1, we obtain 


m fate dz < ore dz < m f a) dz. (6.27) 
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b 
If f g(x) dx = 0, it is obvious from these inequalities that (6.25) holds. 
a f 


b 
If f g(x) dx #0, then, setting 


b -1 >% 
w= | [sajar] - [Faas 
we find by (6.27) that 
m<u<sM, 


but this is equivalent to (6.25). 

The equality (6.26) now follows from (6.25) and the intermediate-value 
theorem for a function f € C[a,b], if we take account of the fact that when 
f € Cla, b|, we have 


= mi d M = E 
m5 ejen O) A M = ea T O 


We remark that (6.23) results from (6.25) if g(x) = 1 on [a,b]. 


c. The Second Mean-value Theorem for the Integral The so-called 
second mean-value theorem* is significantly more special and delicate in the 
context of the Riemann integral. 

So as not to complicate the proof of this theorem, we shall carry out a 
useful preparatory discussion that is of independent interest. 


Abel’s transformation. This is the name given to the following transformation 
n k 

of the sum ` a;b;. Let Ap = 55 ai; we also set Ag = 0. Then 
i=l i=l 


a = Sa — A;_1)b; = S5 Aib; = S Aah = 
— i=1 


1=1 i=1 
n n—1 n—1 
= ` A;b; — ` Aibi+ı = Anbn — Aobı + ` A;(bi — bi41). 
i=1 1=0 1=1 
Thus 
n n—1 
` a;b; = (Anbn = Aob) =f > A;(bi = bi41) i (6.28) 
1=1 1=1 


4 Under an additional hypothesis on the function, one that is often completely 
acceptable, Theorem 6 in this section could easily be obtained from the first 
mean-value theorem. On this point, see Problem 3 at the end of Sect. 6.3. 
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or, since Ap = 0, 


Yah = = Anbn Sat bi+1) (6.29) 


Abel’s transformation provides an easy verification of the following 
lemma. 


Lemma 2. If the numbers Ay = 2 ai (k =1,...,n) satisfy the inequalities 
m < Ay < M and the numbers b; Ge =1,...,n) are nonnegative and bi > bj+4 
fori =1,. — 1, then 

i=1 


Proof. Using the fact that b, > 0 and b; — bi+ı1 > 0 for i = 1,..., n — 1, we 
obtain from (6.29), 


n—l1 
Sdh < Mbp + X M(bi — bi+1) = Mbp + M(b1 — bn) = Mb: . 
i=1 i=1 


The left-hand inequality of (6.30) is verified similarly. O 


Lemma 3. If f € R|a,b], then for any x € [a,b] the function 
F(z) = J f(t) dt (6.31) 


is defined and F(x) € C{a, b]. 


Proof. The existence of the integral in (6.31) for any x € [a,b] is already 
known from Proposition 4 of Sect. 6.1; therefore it remains only for us to verify 
that the function F(z) is continuous. Since f € R|a, b], we have |f| < C < oo 
on [a,b]. Let x € [a,b] and x +h € [a,b]. Then, by the additivity of the 
integral and inequalities (6.19) and (6.20) we obtain 


Fe +h) — F(2)| = f 1a J f(t)at| = 


x+h 
f. f(t) dtl < / IF (t)| dt] < Chl . 
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Here we have used inequality (6.20) taking account of the fact that for 
h < 0 we have 


x+h 


fired =| f roiaj- f roa. 


x+h x+h 
Thus we have shown that if x and z + h both belong to [a,b], then 
|F(x + h) — F(x)| < CIh| (6.32) 


from which it obviously follows that the function F is continuous at each 
point of [a,b]. O 


We now prove a lemma that is a version of the second mean-value theorem. 


Lemma 4. If f,g € Ria,b] and g is a nonnegative nonincreasing function 
on |a, b] then there exists a point £ € [a,b] such that 


b E 
J (F - 9)(@) dx = g(a) J f(a)de. (6.33) 


a 


Before turning to the proof, we note that, in contrast to relation (6.26) of 
the first mean-value theorem, it is the function f(x) that remains under the 
integral sign in (6.33), not the monotonic function g. 


Proof. To prove (6.33), as in the cases considered above, we attempt to esti- 
mate the corresponding Riemann sum. 
Let P be a partition of [a,b]. We first write the identity 


[u-sae=¥ KETU 


= $ 9(2i-1) J f(x)dx+ >) f (92) - oa) (@) aa 


and then show that the last sum tends to zero as \(P) > 0. 
Since f € Ria, b], it follows that |f(x)| < C < œ on [a,b]. Then, using 
the properties of the integral already proved, we obtain 


S | le) -setaa <$ | Ile) - olalla) < 


<C>) | loa) -s(eia)ldr < CX (G5 A) An + 0 
E i=1 
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as A(P) — 0, because g € Fa, b] (see Proposition 2 of Sect. 6.1). Therefore 


b m Ti 
J -oade = dim | glem) / f(a)de. (6.34) 


a 


We now estimate the sum on the right-hand side of (6.34). Setting 


F(2) = J f(t) dt, 


by Lemma 3 we obtain a continuous function on [a,b]. 
Let 


m= min F(x) and M = max F(z). 
x€[a,b] x€[a,b] 


Since i f(x) dx = F(x;) — F(a;_-1), it follows that 


Li-i 


Sosem) | Foa z=) (Fle) —Fa))gtea). (635) 


Taking account of the fact that g is nonnegative and nonincreasing on 
[a,b], and setting 


ai = F(x;) — F(zi-1), bi = g(zi-1) , 


we find by Lemma 2 that 


mg(a) < Ds (F (xi) — F(ai-1))9(ai-1) < Mg(a) , (6.36) 


since 
n= Soa Fi Cn ee oe F(x,) — F(a) = F (zx). 
Having now shown that the sums (6.35) satisfy the inequalities (6.36), 
and recalling relation (6.34), we have 


b 


mg(a) < J a T (6.37) 


a 


If g(a) = 0, then, as inequalities (6.37) show, the relation to be proved 
(6.33) is obviously true. | 
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If g(a) > 0, we set 


b 
1 
H= way | dae. 


It follows from (6.37) that m < u < M, and from the continuity of 
F(x) = f f(t)dt on [a,b] that there exists a point € € [a,b] at which F (£) = u. 
But that is precisely what formula (6.33) says. O 
Theorem 6. (Second mean-value theorem for the integral). If f,g € R[a, b] 


and g is a monotonic function on [a,b], then there exists a point € € [a,b] 
such that 


b b 


£ 
[EDO f se)4e+9() f Fade. (6.38) 


£ 


a 


The equality (6.38) (like (6.33), as it happens) is often called Bonnet’s 
formula.” 


Proof. Let g be a nondecreasing function on [a,b]. Then G(x) = g(b)— g(x) is 
nonnegative, nonincreasing, and integrable on [a,b]. Applying formula (6.33), 
we find 


b E 
J TOdo J dae, (6.39) 


But 
b b 


[eae = a0 f roar- f-oveyae, 


a a 


co f roa = 30) f faas- ole) faas. 


Taking account of these relations and the additivity of the integral, we 
obtain the equality (6.38), which was to be proved, from (6.39). 

If g is a nonincreasing function, setting G(x) = g(x) — g(b), we find that 
G(x) is a nonnegative, nonincreasing, integrable function on fa, b]. We then 
obtain (6.39) again, and then formula (6.38). O 


5 P.O. Bonnet (1819-1892) — French mathematician and astronomer. His most 
important mathematical works are in differential geometry. 
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6.2.4 Problems and Exercises 


1. Show that if f € R[a,b] and f(x) > 0 on [a,b], then the following statements 
are true. 


a) If the function f(x) assumes a positive value f(zo) > 0 at a point of continuity 
xo € [a,b], then the strict inequality 


f roa >0 


holds. 

b) The condition f f(x) dz = 0 implies that f(x) = 0 at almost all points of 
(a, b]. i 
2. Show that if f € R[a, b], m = inf f(z), and M = a f(x), then 


b 
a) f f(x) dx = u(b — a), where u € [m, M] (see Problem 5a of Sect. 6.1); 


b) if f is continuous on [a,b], there exists a point € €]a, b| such that 


/ f(x) de = f(E\(b— a). 


3. Show that if f € C[a,b], f(x) > 0 on [a,b], and M = max f(x), then 


1/n 


b 
lim [re dx =M. 
n —> CO 


4. a) Show that if f € R[a,b], then |f|? € R[a, b] for p > 0. 
b) Starting from Hölder’s inequality for sums, obtain Hölder’s inequality for 


integrals:° 
b 1/p b 1/q 
< ( J ae eo) l ( if gl (x) 2o) | 


if f,g E€ Ra, b],p>1,qg>1,and5+2=1. 


fu - g)(x) dx 


6 The algebraic Hélder inequality for p = q = 2 was first obtained in 1821 by 
Cauchy and bears his name. Holder’s inequality for integrals with p = q = 2 
was first discovered in 1859 by the Russian mathematician B. Ya. Bunyakovskii 
(1804-1889). This important integral inequality (in the case p = q = 2) is called 
Bunyakovskit’s inequality or the Cauchy—Bunyakovskii inequality. One also some- 
times sees the less accurate name “Schwarz inequality” after the German math- 
ematician H. K. A. Schwarz (1843-1921), in whose work it appeared in 1884. 
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c) Starting from Minkowski’s inequality for sums, obtain Minkowski’s inequality 
for integrals: 


b 1/p b 1/p b 1/p 
(j raroa) <(/ sre) (/ areas) | 


if f,g € Ria, b] and p > 1. Show that this inequality reverses direction if 0 < p < 1. 


d) Verify that if f is a continuous convex function on R and ọ an arbitrary 
continuous function on R, then Jensen’s inequality 


f ( | p(t) a) < > / f(v(t)) dt 


0 0 
holds for c Æ 0. 


6.3 The Integral and the Derivative 


6.3.1 The Integral and the Primitive 


Let f be a Riemann-integrable function on a closed interval [a,b]. On this 
interval let us consider the function 


ee J f(t) dt, (6.40) 


often called an integral with variable upper limit. 

Since f € R[a, b], it follows that f haai E€ Rla, x] if [a,x] C [a,b]; therefore 
the function x +» F(x) is unambiguously defined for x € [a,b]. 

If |f(t)| < C < +00 on [a,b] (and f, being an integrable function, is 
bounded on fa, b]), it follows from the additivity of the integral and the ele- 
mentary estimate of it that 


\F(x +h) — F(x)| < Clhl , (6.41) 


if x,x +h € [a,b]. 

Actually, we already discussed this while proving Lemma 3 in the preced- 
ing section. 

It follows in particular from (6.41) that the function F is continuous on 
[a,b], so that F € Cla, b]. 

We now investigate the function F more thoroughly. 

The following lemma is fundamental for what follows. 


Lemma 1. If f € R|a,b] and the function f is continuous at a point x € 
[a,b], then the function F defined on [a,b] by (6.40) is differentiable at the 
point x, and the following equality holds: 


F'(x) = f(z). 
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Proof. Let x,x +h € [a,b]. Let us estimate the difference F(x +h) — F(a). It 

follows from the continuity of f at x that f(t) = f(x)+A(t), where A(t) > 0 

ast — x,t € [a,b]. If the point x is held fixed, the function A(t) = f(t)— f(z) 

is integrable on [a,b], being the difference of the integrable function t + f(t) 

and the constant f(z). We denote by M(h) the quantity sup |A(t)|, where 
tEI(h) 


I(h) is the closed interval with endpoints z,x + h € [a,b]. By hypothesis 
M(h) > 0 as h > 0. 
We now write 


x+h x+h 
F(z +h) -F = J rou- fioa- f Aiie 
x+h E x+h 


= J (f(x) + A(t)) d t= f tears | amas f(z)h+a(h)h, 


4 


where we have set 


A(t) dt = a(h)h. 


Since 


x+h x+h x+h 


I A(t) dtl < i |A(t)| dt] < / M(h) dt| = M(h)|hl , 


it follows that |a(h)| < M(h), and so a(h) > 0 as h > 0 (in such a way that 
x+h E€ |a, d}). 

Thus we have shown that if the function f is continuous at a point x € 
[a,b], then for displacements h from x such that x +h € [a,b] the following 
equality holds: 

F(x+h)— F(x) = f(x)h+alh)h , (6.42) 


where a(h) > 0 as h —> 0. 
But this means that the function F(x) is differentiable on [a,b] at the 
point x € [a,b] and that F'(x) = f(x). o 


A very important immediate corollary of Lemma 1 is the following. 


Theorem 1. Every continuous function f : [a,b] > R on the closed interval 
[a,b] has a primitive, and every primitive of f on [a,b] has the form 


F(a) = J f(t)dt-+e, (6.43) 


where c is a constant. 
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Proof. We have the implication (f € C[a,b]) > (f € Ria, b]), so that by 
Lemma 1 the function (6.40) is a primitive for f on [a,b]. But two primitives 
F(x) and F(x) of the same function on a closed interval can differ on that 
interval only by a constant; hence F(x) = F(x) +c. O 


For later applications it is convenient to broaden the neoncept of primitive 
slightly and adopt the following definition. 


Definition 1. A continuous function x ++ F(x) on an interval of the real 
line is called a primitive (or generalized primitive) of the function x +> f(z) 
defined on the same interval if the relation F’(x) = f(x) holds at all points 
of the interval, with only a finite number of exceptions. 


Taking this definition into account, we can assert that the following the- 
orem holds. 


Theorem 1’. A function f : [a,b] —> R that is defined and bounded on a 
closed interval [a,b] and has only a finite number of points of discontinuity 
has a (generalized) primitive on that interval, and any primitive of f on [a,b] 
has the form (6.43). 


Proof. Since f has only a finite set of points of discontinuity, f € R[a, b], and 
by Lemma 1 the function (6.40) is a generalized primitive for f on [a,b]. Here 
we have taken into account, as already pointed out, the fact that by (6.41) 
the function (6.40) is continuous on [a,b]. If F(x) is another primitive of f 
on [a,b], then F(x) — F(x) is a continuous function and constant on each of 
the finite number of intervals into which the discontinuities of f divide the 
closed interval [a,b]. But it then follows from the continuity of F(x) — F(z) 
on all of [a,b] that F(x) — F(x) = const on [a,b]. O 


6.3.2 The Newton-—Leibniz Formula 


Theorem 2. If f : [a,b] > R is a bounded function with a finite number of 
points of discontinuity, then f € Ria, b] and 


b 
fto) dz = F(b) — F(a), (6.44) 


where F : [a,b] > R is any primitive of f on [a,b]. 


Proof. We already know that a bounded function on a closed interval having 
only a finite number of discontinuities is integrable (see Corollary 2 after 
Proposition 2 in Sect. 6.1). The existence of a generalized primitive F(x) of 
the function f on [a,b] is guaranteed by Theorem 1’, by virtue of which F(z) 
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has the form (6.43). Setting x = a in (6.43), we find that F(a) = c, and so 
F(a) = J f(t) dt + F(a) . 
In particular 
J Eat = F0) - Flo), 


which, up to the notation for the variable of integration, is exactly formula 
(6.44), which was to be proved. O 


Relation (6.44), which is fundamental for all of analysis, is called the 
Newton-Leibniz formula (or the fundamental theorem of calculus). 
The difference F(b) — F(a) of values of any function is often written 


b , i RE 
F (x)|. In this notation, the Newton—Leibniz formula assumes the form 


b 
[foe = F(x)? ; 


Since both sides of the formula reverse sign when a and b are interchanged, 
the formula is valid for any relation between the magnitudes of a and b, that 
is, both for a < b and for a > b. 

In exercises of analysis the Newton—Leibniz formula is mostly used to 
compute the integral on the left-hand side, and that may lead to a some- 
what distorted idea of its use. The actual situation is that particular inte- 
grals are rarely found using a primitive; more often one resorts to direct 
computation on a computer using highly developed numerical methods. The 
Newton-—Leibniz formula occupies a key position in the theory of mathemat- 
ical analysis itself, since it links integration and differentiation. In analysis 
it has a very far-reaching extension in the form of the so-called generalized 
Stokes’ formula.’ 

An example of the use of the Newton—Leibniz formula in analysis itself is 
provided by the material in the next subsection. 


6.3.3 Integration by Parts in the Definite Integral 
and Taylor’s Formula 


Proposition 1. If the functions u(x) and v(x) are continuously differen- 
tiable on a closed interval with endpoints a and b, then 


b b 


Jo -v'\(x) dz = (u-v)? — fo -u')(x) dx. (6.45) 


a a 


T G.G. Stokes (1819-1903) — British physicist and mathematician. 
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It is customary to write this formula in abbreviated form as 
b b 
b 
fuawv=u vk- f vdu 
a a 


and call it the formula for integration by parts in the definite integral. 


Proof. By the rule for differentiating a product of functions, we have 


(u: v) (z) = (u' - v)(x) + (u-v')(z) . 


By hypothesis, all the functions in this last equality are continuous, and hence 
integrable on the interval with endpoints a and b. Using the linearity of the 
integral and the Newton-Leibniz formula, we obtain 


b b 


(u-v)(x)|° = fu -v)(x) dx + [ure dz. O 


a a 


As a corollary we now obtain the Taylor formula with integral form of the 
remainder. 

Suppose on the closed interval with endpoints a and z the function t => 
f(t) has n continuous derivatives. Using the Newton-Leibniz formula and 
formula (6.45), we carry out the following chain of transformations, in which 
all differentiations and substitutions are carried out on the variable t: 


f(x) - f(a) = J f' (t) dt = — J f'(t)(a — t)' dt = 


-f(H(e-H|5 + | #"Ol@-Hat = 


f'(a)(x — a) — ; [roe —t)?)' dt = 


= sae a)— Sf"(e- E+ | Oe- at = 


fi(a)(e— a) + ZF- a- z | Ol-i) at = 


= f'(a)(e—a) + 5f"(a)(@- a)? +--+ 


trua Me = a= T Tn—1(0; T) ) 
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where 
FEN ’ ) (n 7 1)! Ji (t)( t) dt . (6.46) 


a 


Thus we have proved the following proposition. 


Proposition 2. If the function t +> f(t) has continuous derivatives up to 
order n inclusive on the closed interval with endpoints a and x, then Taylor’s 
formula holds: 


fle) = f(a) + GOE- H tgi Oea ralan) 


with remainder term rn—ı(a; x) represented in the integral form (6.46). 


We note that the function (x — t)”~1 does not change sign on the closed 
interval with endpoints a and z, and since t+» f™ (t) is continuous on that 
interval, the first mean-value theorem implies that there exists a point € such 
that 


farra Tr) = ! (t(x — t)! dt = 
aaa ’ ) aml! (t)( t) dt 


1 n f n—l1 
oO [e-0 at= 


T th : gol- 3-0") 


= f(a)". 


T 
a 


We have again obtained the familiar Lagrange form of the remainder in 
Taylor’s theorem. By Problem 2d) of Sect. 6.2, we may assume that € lies in 
the open interval with endpoints a and zx. 

This reasoning can be repeated, taking the expression f ”) (€)(x — €)"—*, 
where k € [1,n], outside the integral in (6.46). The Cauchy and Lagrange 
forms of the remainder term that result correspond to the values k = 1 and 
k=n. 


6.3.4 Change of Variable in an Integral 


One of the basic formulas of integral calculus is the formula for change of 
variable in a definite integral. This formula is just as important in integration 
theory as the formula for differentiating a composite function is in differential 
calculus. Under certain conditions, the two formulas can be linked by the 
Newton-—Leibniz formula. 
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Proposition 3. If y: [a, GB] > [a,b] is a continuously differentiable mapping 
of the closed interval a < t < B into the closed interval a < x < b such that 
pla) =a and (8) = b, then for any continuous function f(x) on [a,b] the 
function f (y(t))p'(t) is continuous on the closed interval [a, 8], and 


b B 
J ene J HEOI (6.47) 


Proof. Let F(x) be a primitive of f(x) on [a,b]. Then, by the theorem on 
differentiation of a composite function, the function F (y(t)) is a primitive 
of the function f(y(t))y’(t), which is continuous, being the composition and 
product of continuous functions on the closed interval [a, 8]. By the Newton- 


B 
Leibniz formula ff) da = F(b) — F(a) and f f(p(t))y’(t) dt = F(y(8)) — 


F (y(a)). But by hypothesis y(a) = a and y(@) = b, so that Eq. (6.47) does 
indeed hold. O 


It is clear from formula (6.47) how convenient it is that we have not 
just the symbol for the function, but the entire differential f(x)dx in the 
symbol for integration, which makes it possible to obtain the correct integrand 
automatically when the new variable x = y(t) is substituted in the integral. 

So as not to complicate matters with a cumbersome proof, in Proposition 3 
we deliberately shrank the true range of applicability of (6.47) and obtained it 
by the Newton—Leibniz formula. We now turn to the basic theorem on change 
of variable, whose hypotheses differ somewhat from those of Proposition 3. 
The proof of this theorem will rely directly on the definition of the integral 
as the limit of Riemann sums. 


Theorem 3. Let y : [a,G] — [a,b] be a continuously differentiable strictly 
monotonic mapping of the closed interval a < t < B into the closed interval 
a < x < b with the correspondence y(a) = a, y(B) =b or y(a) =b, y(B) =a 
at the endpoints. Then for any function f(x) that is integrable on [a,b] the 
function f (y(t))y'(t) is integrable on [a, B] and 


p(B) B 
J f(x) dz = J f (p(t))y' (t) dt. (6.48) 


(a) 


Proof. Since ọ is a strictly monotonic mapping of |a, 8] onto [a,b] with end- 
points corresponding to endpoints, every partition P, (a = to < --- < tn = B) 
of the closed interval [a, 8] generates a corresponding partition P, of [a,b] 
by means of the images z; = y(t;) (i = 0,...,n); the partition P, may 
be denoted y(P,). Here xo = a if y(a) = a and zp = b if y(a) = b. It 
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follows from the uniform continuity of y on [a, 8] that if A(P;) > 0, then 
\(Pz) = A(y(P;)) also tends to zero. 

Using Lagrange’s theorem, we transform the Riemann sum o(f; P,,€) as 
follows: 


2 f(E) Ax; = 2 f (Ei) (ai — zi-1) = 


_ -Df (Ti) )y (Fi) (ti =i 1) = = eC yp’ (7;) At; ; 


Here x; = y(t;), & = (Ti), & lies in the closed interval with endpoints 
zi—ı and x;, and the points 7; and 7; lie in the interval with endpoints t;_ 
and t; (i=1,...,n). 

Next 


De f(e(n))¢ (7%) At; = wri f( plri)) ip’ (Ti) At; + 


+ 3 FCT (P (T) — 9 (Ti) ) At: . 


Let us estimate this last sum. Since f € Ra, b], the function f is bounded 
on [a,b]. Let |f(x)| < C on [a,b]. Then 


Ye P (Fi) — p'(7:))Ati| < C- $ olg’; AiAt , 


i=1 


where A; is the closed interval with endpoints t;_, and t;. 
This last sum tends to zero as \(P;) > 0, since y’ is continuous on [a, 8]. 
Thus we have shown that 


> FE) Ar; = dU Fler)" (mi) Ati Fay 


where a — 0 as X(P;) > 0. As already pointed out, if A(P;) —> 0, then 
A(P,) + 0 also. But f € Ra, b], so that as \(P,) —> 0 the sum on the left- 


p(B) 
hand side of this last equality tends to the integral f f(x)dx. Hence as 
pla) 
A(P;) > 0 the right-hand side of the equality also has the same limit. 
But the sum $` f (y(7:)) y (7;) At; can be regarded as a completely arbi- 
i=l 


trary Riemann sum for the function f (y(t))p' (t) corresponding to the parti- 
tion P, with distinguished points T = (7),..., Tn), since in view of the strict 
monotonicity of y, any set of points T can be obtained from some correspond- 
ing set € = (£),...,&n) of distinguished points in the partition P, = y(P,). 
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Thus, the limit of this sum is, by definition, the integral of the func- 
tion f (p(t))y’(t) over the closed interval [a, 3], and we have simultaneously 
proved both the integrability of f(y(t))y’(t) on [a, 8] and formula (6.48). o 


6.3.5 Some Examples 


Let us now consider some examples of the use of these formulas and the 
theorems on properties of the integral proved in the last two sections. 


Example 1. 
1 n/2 T/2 
[ Vi=#ar = J V1 -— sin? tcostdt = / cos? t dt = 
—1 —17/2 =r /2 
T/2 J 
1 1 1 m/2 T 
= = 1 = =( — Sj ) = —. 
5 J (1 + cos 2t) dt z(t + 5 Sin 2t a 
—1/2 


In computing this integral we made the change of variable x = sint and 
then, after finding a primitive for the integrand that resulted from this sub- 
stitution, we applied the Newton-—Leibniz formula. 

Of course, we could have proceeded differently. We could have found the 
rather cumbersome primitive iry 1 — x? +4 arcsin x for the function v1 — x? 
and then used the Newton-Leibniz formula. This example shows that in 
computing a definite integral one can fortunately sometimes avoid having 
to find a primitive for the integrand. 


Example 2. Let us show that 


ee T T 
a) J sinme cosa da = 0. b) | sin? made =n, c) [ cos? nade =m 
T = 


T 


for m,n E N. 


a) J sin Mz cosnz dx = 7 J (sin(n + m)x — sin(n — m)x) dz = 
T ai 


T 


-( : (n+m)x + z cos(n m)a) 
=——({ — cos(n + mx —m)x 
2 n+m n—m 


ifn —m Æ 0. The case when n — m = 0 can be considered separately, and in 
this case we obviously arrive at the same result. 
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f 1f ly x 
. 2 PE = we — : -_ 
b) J sin“ mz dz = 3 J (1—cos2mz) dz 7 (2 zm, SP 2ma) Mo 
c) cos? nde = 1 [1 eosansjds = i (2+ 2 sin 2ne) "ag 
2 2 2n -r 


Example 3. Let f € R|—a, a]. We shall show that 


2 J f(x) da , if f is an even function , 


| te)ae = 0 

If f(—x) = f(x), then 
f o= freas | a= fr = nars f es 
-fr jas | re æ= | (f(—a) + f(z)) ae=2 f fla)ae 


0 


0, if f is an odd function . 


If f(—x) = — f(x), we obtain from the same computations that 


fioa- f (1-2) + 1@)a2= foae=o. 


Example 4. Let f be a function defined on the entire real line R and having 
period T, that is f(x +T) = f(z) for all x € R. 
If f is integrable on each finite closed interval, then for any a € R we have 


the equality 
a+T 


| sayae = f roar, 
0 


a 
that is, the integral of a periodic function over an interval whose length equals 
the period T of the function is independent of the location of the interval of 
integration on the real line: 
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wae fro jas freas J f(x) dz = 


T 


[ros f roas f jearen 


0 a T 
= [ Heyer f tods f a= f ode 


Here we have made the change of variable x = t + T and used the peri- 
odicity of the function f(z). 


€ © 


© 


1 
Example 5. Suppose we need to compute the integral f sin(x?) dz, for exam- 
0 


ple within 1072. 

We know that the primitive f sin(x?) dx (the Fresnel integral) cannot be 
expressed in terms of elementary functions, so that it is impossible to use the 
Newton-—Leibniz formula here in the traditional sense. 

We take a different approach. When studying Taylor’s formula in differ- 
ential calculus, we found as an example (see Example 11 of Sect. 5.3) that 
on the interval [—1, 1] the equality 

sing g — at + =a = SPT) 
holds within 1078. 

But if |sinz — P(x)| < 1073 on the interval [—1,1], then |sin(x?) — 
P(x*)| < 107 also, for 0 < x < 1. 

Consequently, 


1 1 1 1 
| [sino ee) ee < J \sin(z®)-P(2?)| az < [r0-8ae < 1078 
0 0 0 


1 
Thus, to compute the integral f sin(x?) dx with the required precision, it 
0 


1 
suffices to compute the integral f P(x?) dx. But 
0 


Le: E ny) 1 1 1 E 
Aoa e aa pee — 0.310 + 10-3 
(=e T trj” 3 37° 5M1 
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and therefore 


1 
J sin(x”) dz = 0.310 + 2 - 107° = 0.31 + 107°. 
0 


b 
Example 6. The quantity 4 = ;~ f f(x) dz is called the integral average 


a 
value of the function on the closed interval |a, b]. 
Let f be a function that is defined on R and integrable on any closed 
interval. We use f to construct the new function 


x+ô 


Fa) = 55 | fae, 
x—ô 


whose value at the point x is the integral average value of f in the ð- 
neighborhood of zx. | 
We shall show that F(x) (called the average of f) is, compared to f, 
more regular. More precisely, if f is integrable on any interval [a,b], then 
F(z) is continuous on R, and if f € C(R), then F(x) € C@)(R). 
We verify first that Fs(x) is continuous: 


j x+ô+h x—ô 
Fs(z +h) — F(a) = 5 J fedt+ | Ods 
x+ô x—ô+h 
1 C 
Set es 
< = (Cla + Clal) = ŠIH 


if | f(t)| < C, for example, in the 26-neighborhood of x and |h| < ô. It is 
obvious that this estimate implies the continuity of F(x). 
Now if f € C(R), then by the rule for differentiating a composite function 


p(z) P 
= J f(t)dt = 7 Jre dt - ni = f(y(z)) g (z) ; 


a 
so that from the expression 


x+ô 


xr—ô 
Ps(2) = 55 / f(t)dt — = J f(t) at 


we find that Gri 5) 
F(a) = Fete) fen) 
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After the change of variable t = x + u in the integral, the function F5(x) 
can be written as 


ô 
1 
Haes TEGDI 
—ô 
If f € C(R), then, applying the first mean-value theorem, we find that 


F(x) £r +T): 28 = f(x +r), 


1 
= 35 ( 
where |r| < ô. It follows that 


[5m A )() = f(x), 


which is completely natural. 


6.3.6 Problems and Exercises 


1. Using the integral, find 
a) lim [ate to + ate] 


° 1% +22% +... a. 
b) Jim ae, ifa > 0. 
2. a) Show that any continuous function on an open interval has a primitive on 
that interval. 


b) Show that if f € C a,b], then f can be represented as the difference of 
two nondecreasing functions on [a,b] (see Problem 4 of Sect. 6.1). 


3. Show that if the function g is smooth, then the second mean-value theorem 
(Theorem 6 of Sect. 6.2) can be reduced to the first mean-value theorem through 
integration by parts. 


. 4. Show that if f € C(R), then for any fixed closed interval [a,b], given € > 0 one 
can choose 6 > 0 so that the inequality |Fs(x) — f(x)| < € holds on [a,b], where Fs 
is the average of the function studied in Example 6. 


5. Show that 


2 


Zz 

t 

1 2 

[aw Se as x => +00. 
t x? 

1 


r+1 
6. a) Verify that the function f(z) = f sin(t*) dt has the following representation 


x 
as © — CO: 


b) Find lim xf(zx) and lim zf(z). 


T— OO 
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7. Show that if f : R — R is a periodic function that is integrable on every closed 
interval [a,b] C R, then the function 


F(z) = J f(t) dt 


can be represented as the sum of a linear function and a periodic function. 


8. a) Verify that for x > 1 and n € N the function 


T 


Pla) = 1 f (e+ VT? — leosy) dy 


0 


is a polynomial of degree n (the nth Legendre polynomial). 
b) Show that 


T 


TEEN OE.. __., 
ia | G v a) 


9. Let f be a real-valued function defined on a closed interval [a,b] C R and 
£1,...,&m distinct points of this interval. The values of the Lagrange interpolating 
polynomial of degree m — 1 


Lm-1(2) = >> fE [] i 
j=l i 


ižj 4 


are equal to the values of the function at the points £1,...,&m (the nodes of the 
interpolation), and if f € C’” [a, b}, then 


(2) - Lm=1(2e) = =f (c(a) Jumla) , 


where Wm(x) = il (x — ĉi) and ¢(x) €]a, b[ (see Exercise 11 in Sect. 5.3). 
i=1 


Let €; = “4% + 95*6;; then 6; € [-1, 1], i=1,...,m. 
a) Show that 


b | m 
J Ema) dz = 4 Sif (&) ’ 
a i=1 


1 
t— 6; 
a= f (] a) a 
tj 


—1 


where 
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In particular 


b 


a1) [ tole) 4 = e-o 


) , ifm=1, 6 =0; 


b 


a2) [@e = =" [#(a) + £0) /ifm=2,0.=-1,@=1; 
b 


as J tale)ae =" STOERE 


02 =0,03=1. 


a+b 


) +10] ,d 23.0) ]—1, 


b) Assuming that f € C™ [a,b] and setting Mm = ses |f°™ (x)|, estimate 
rela, 


the magnitude R,, of the absolute error in the formula 


ite dz = pw dx + Rm (*) 


and show that |Rm| < “= Mn f |wm(x)| dz. 


c) In cases a1), a2), aid a3) formula (*) is called respectively the rectangular, 
trapezoidal, and parabolic rule. In the last case it is also called Simpson’s rule.® 
Show that the following formulas hold in cases ai), a2), and ye 


_ FE) 2 aa (&) 3 Be iss (=). 
Rı = — = (b-a) ’ y= == (ba) ’ iasi 2880 (b — a)”, 


where £1, £2, 3 € [a,b] and the function f belongs to a suitable class C™ ja, b). 


d) Let f be a polynomial P. What is the highest degree of polynomials P for 
which the D a trapezoidal, and parabolic rules respectively are exact? 


Let h = =*, £k = a + hk, (k =0,1,...,n), and yx = f (xx). 


e) Show that j in the rectangular rule 
b : 
| #@)4e = h(yo + yi +: + Yn-1) + Pa 


the remainder has the form Ry = £ LIS) (b — a)h, where € € [a,b]. 
f) Show that in the trapezoidal aa 


J #@ a= Z [lvo + yn) + 200 + y2 + + yn—1)| + R2 


the remainder has the form R2 = -£8 (5 —a)h?, where £ € [a,b]. 


8 T. Simpson (1710-1761) — British mathematician. © 
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g) Show that in Simpson’s rule (the parabolic rule) 


b 
h 
J #02) ae = Z [fuo + yn) + (un +y + + Onna) + 
i + 2(y2 + ya ++: + Yn-2| + Rs , 


which can be written for even values of n, the remainder Rg has the form 


ft © 4 
R3 = — 180 (b—a)h’ , 


where € € [a,b]. 
h) Starting from the relation 


1 
dx 
=A 
ý [= 
0 


compute m within 1073, using the rectangular, trapezoidal, and parabolic rules. 
Note carefully the efficiency of Simpson’s rule, which is, for that reason, the most 
widely used quadrature formula. (That is the name given to formulas for numerical 
integration in the one-dimensional case, in which the integral is identified with the 
area of the corresponding curvilinear trapezoid.) 


10. By transforming formula (6.46), obtain the following forms for the remainder 
term in Taylor’s formula, where we have set h = x — a: 


a) may | (a+ TAA 2) ar 


0 


1 
b) arf O-ha. 
0 


11. Show that the important formula (6.48) for change of variable in an integral 
remains valid without the assumption that the function in the substitution is mono- 
tonic. 


6.4 Some Applications of Integration 


There is a single pattern of ideas that often guides the use of integration in 
applications; for that reason it is useful to expound this pattern once in its 
pure form. The first subsection of this section is devoted to that purpose. 


6.4.1 Additive Interval Functions and the Integral 


In discussing the additivity of the integral over intervals in Sect. 6.2 we in- 
troduced the concept of an additive (oriented) interval function. We recall 
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that this is a function (a, 3) + I(a, 6) that assigns a number I(a, 3) to each 
ordered pair of points (a, 8) of a fixed closed interval [a,b], in such a way 
that the following equality holds for any triple of points a, G,~¥ € [a,b]: 


I(a, y) = I(a, 8) + 1(8,7) - (6.49) 


It follows from (6.49) when a = 8 = y that I(a,a) = 0, while for a = y 
we find that I(a,@) + I(G,a) = 0, that is, I(a, 8) = —I(G,a). This relation 
shows the effect of the order of the points a, 8. 

Setting 

F(x) = I(a,z) , 


by the additivity of the function J we have 
I(a, B) = I(a, B) — I(a,a) = F(B) — F(a) . 
Thus, every additive oriented interval function has the form 
I(a, b) = F(8) — F(a) , (6.50) 


where x ++ F(x) is a function of points on the interval [a,b]. 

It is easy to verify that the converse is also true, that is, any function 
xz ++ F(x) defined on [a,b] generates an additive (oriented) interval function 
by formula (6.50). 

We now give two typical examples. 


Example 1. If f E€ R{a,b], the function F(x) = f f(t) dt generates via for- 
mula (6.50) the additive function 


B 
I(a, 8) = I f(t) dt. 


We remark that in this case the function F(x) is continuous on the closed 
_ interval [a,b]. 


Example 2. Suppose the interval [0,1] is a weightless string with a bead of 
unit mass attached to the string at the point x = 1/2. 

Let F(x) be the amount of mass located in the closed interval [0, x] of the 
string. Then by hypothesis 


0 for x < 1/2, 
F(z) = 
1 for 1/2 <xz<1. 


The physical meaning of the additive function 
I(a, B) = F(@) — F(a) 


for 3 > a is the amount of mass located in the half-open interval |a, 8]. 
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Since the function F is discontinuous, the additive function I(q, 6) in this 
case cannot be represented as the Riemann integral of a function — a mass 
density. (This density, that is, the limit of the ratio of the mass in an interval 
to the length of the interval, would have to be zero at any point of the interval 
[a,b] except the point x = 1/2, where it would have to be infinite.) 


We shall now prove a sufficient condition for an additive interval function 
to be generated by an integral, one that will be useful in what follows. 


Proposition 1. Suppose the additive function I(a, 3) defined for points a, B 
of a closed interval [a,b] is such that there exists a function f € R{a,}] 
connected with I as follows: the relation 


inf = f(x)(@—a) <Ia, b) < sup f(x) -— a) 


rE[a,/] x€[a,(] 


holds for any closed interval [a, 6] such thata < œa < B < b. Then 


b 
TAE J EA 


Proof. Let P be an arbitrary partition a = £o < --- < £n = b of the closed 
interval [a,b], let m; = an f(x), and let M; = sup f(x). 
xE 


Li-1, Li re[xi-1,2;] 
For each interval [7;_1,2;] of the partition P we have by hypothesis 


m,Ax; < I(£i—1, zi) < M;Ax; . 


Summing these inequalities and using the additivity of the function I(a, 8), 
we obtain 
n n 
Sm Az; < I(a,b) < X MA; . 
i=1 i=1 
The extreme terms in this last relation are familiar to us, being the upper 
and lower Darboux sums of the function f corresponding to the partition 
P of the closed interval [a,b]. As A(P) — 0 they both have the integral of 
f over the closed interval [a,b] as their limit. Thus, passing to the limit as 
A(P) — 0, we find that 


b 
Hane / fla)de. 0 


Let us now illustrate Proposition 1 in action. 
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6.4.2 Arc Length 


Suppose a particle is moving in space R3 and suppose its law of motion 
is known to be r(t) = (x(t), y(t), z(t)), where x(t), y(t), and z(t) are the 
rectangular Cartesian coordinates of the point at time t. 

We wish to define the length l[a, b] of the path traversed during the time 
interval a < t < b. 

Let us make some concepts more precise. 


Definition 1. A path in R? is a mapping t +> (x(t), y(t), z(t)) of an interval 
of the real line into R? defined by functions x(t), y(t), z(t) that are continuous 
on the interval. 


Definition 2. If t +> (x(t), y(t), 2(t)) is a path for which the domain of the 
parameter ¢ is the closed interval [a, b] then the points 


A = (z(a),y(a),2(a)) and B = (x(6), y(b), z(b)) 
in R3 are called the initial point and terminal point of the path. 


Definition 3. A path is closed if it has both an initial and terminal point, 
and these points coincide. 


Definition 4. If I : I > R? is a path, the image T (I) of the interval I in 
R? is called the support of the path. 


The support of an abstract path may turn out to be not at all what 
we would like to call a curve. There are examples of paths whose supports, 
for example, contain an entire three-dimensional cube (the so-called Peano 
“curves” ). However, if the functions x(t), y(t), and z(t) are sufficiently regular 
(as happens, for example, in the case of a mechanical motion, when they are 
differentiable), we are guaranteed that nothing contrary to our intuition will 
occur, as one can verify rigorously. 


Definition 5. A path I : I > R? for which the mapping J — P (I) is one- 
to-one is called a simple path or parametrized curve, and its support is called 
a curve in R3. 


Definition 6. A closed path T : [a,b] + R? is called a simple closed path or 
simple closed curve if the path T : [a, b|— R? is simple. 


Thus a simple path differs from an arbitrary path in that when moving 
over its support we do not return to points reached earlier, that is, we do 
not intersect our trajectory anywhere except possibly at the terminal point, 
when the simple path is closed. 


Definition 7. The path I: I > R? is called a path of a given smoothness 
if the functions x(t), y(t), and z(t) have that smoothness. 


(For example, the smoothness C[a, b], C [a,b], or C [a, b].) 
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Definition 8. A path I : [a,b] > R? is piecewise smooth if the closed in- 
terval [a,b] can be partitioned into a finite number of closed intervals on 
each of which the corresponding restriction of the mapping I’ is defined by 
continuously differentiable functions. 


It is smooth paths, that is, paths of class C) and piecewise smooth paths 
that we intend to study just now. 

Let us return to the original problem, which we can now state as the 
problem of defining the length of a smooth path T : [a,b] > R3. 

Our initial ideas about the length l[a, b] of the path traversed during the 
time interval a < t < @ are as follows: First, if a < 8 < y, then 


lla, y] = Ia, 8] + [8,7] , 
and second, if v(t) = («(t), y(t), 2(t)) is the velocity of the point at time t, 
then 
inf |v(t)|(@—a@) < lla, p] < sup |v(t)|(G—a). 


re [a,G] xre€[a, 3] 


Thus, if the functions x(t), y(t), and z(t) are continuously differentiable 
on [a,b], by Proposition 1 we arrive in a deterministic manner at the formula 


b b 
lave J Iv(t)| dt = / JPO+PO+P dt, (651 


which we now take as the definition of the length of a smooth path I: [a,b] —> 
R°. 
If z(t) = 0, the support lies in a plane, and formula (6.51) assumes the 


b 
jais J TOETO (6.52) 


Example 3. Let us test formula (6.52) on a familiar object. Suppose the point 
moves according to the law 


x = Reos2zt, 
(6.53) 
y = Rsin2zt. 


Over the time interval [0,1] the point will traverse a circle of radius R, 
that is, a path of length 27 R if the length of a circle can be computed from 
this formula. 

Let us carry out the computation according to formula (6.52): 


1 
l[0, 1] = J y (—2r Rsin 27t)? + (2r R cos 2rt)? dt = 2r R . 
0 
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Despite the encouraging agreement of the results, the reasoning just car- 
ried out contains some logical gaps that are worth paying attention to. 

The functions cos a and sin a, if we use the high-school definition of them, 
are the Cartesian coordinates of the image p of the point po = (1,0) under a 
rotation through angle a. 

Up to sign, the quantity œ is measured by the length of the arc of the 
circle x? +y? = 1 between po and p. Thus, in this approach to trigonometric 
functions their definition relies on the concept of the length of an arc of a 
circle and hence, in computing the circumference of a circle above, we were 
in a certain sense completing a logical circle by giving the parametrization in 
the form (6.53). 

However, this difficulty, as we shall now see, is not fundamental, since a 
parametrization of the circle can be given without resorting to trigonometric 
functions at all. | 

Let us consider the problem of computing the length of the graph of a 
function y = f(x) defined on a closed interval [a,b] C R. We have in mind 
the computation of the length of the path T : [a,b] — R? having the special 
parametrization 


LH (x, f(zx)) , 


from which one can conclude that the mapping T : [a,b] + R? is one-to-one. 
Hence, by Definition 5 the graph of a function is a curve in R?. 

In this case formula (6.52) can be simplified, since by setting x = t and 
y = f(t) in it, we obtain 


b 
iais ; JI+ FOR dt . (6.54) 


In particular, if we consider the semicircle 


y= l= =ls27= 1, 


of the circle x? + y? = 1, we obtain for it 


=f i4 eel ef ee (6.55) 


But the integrand in this last integral is an unbounded function, and 
hence does not exist in the traditional sense we have studied. Does this mean 
that a semicircle has no length? For the time being it means only that this 
parametrization of the semicircle does not satisfy the condition that the func- 
tions t and y be continuous, under which formula (6.52), and hence also 
formula (6.54), was written. For that reason we must either consider broad- 
ening the concept of integral or passing to a parametrization satisfying the 
conditions under which (6.54) can be applied. 
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We remark that if we consider this parametrization on any closed interval 
of the form [—1 + 6,1 — 6], where —1 < —1 +ô < 1-6 < 1, then formula 
(6.54) applies on that interval, and we find the length 


(ee ee bio Tl 
| er 


~146 


for the arc of the circle lying above the closed interval [—1 + 6,1 — ô]. 
It is therefore natural to consider that the length l of the semicircle is 
the limit jim, I[—1+6,1—]. One can interpret the integral in (6.55) in the 
—_ 


same sense. We shall study this naturally arising extension of the concept of 
a Riemann integral in the next section. 

As for the particular problem we are studying, without even changing the 
parametrization one can find, for example, the length l [ — 5 | of an arc of 
the unit circle subtended by a chord congruent to the radius of the circle. 
Then (from geometric considerations alone) it must be that | = 3-1[— 4,5]. 


292 
We remark also that 


ak eae 


xd( 1 — z?) 
zdl -r)a y 1-— r? dz — ry 1- zr? , 
V1 -— r? J 


and therefore 


1-6 
I[-1 +8,1- ô| = 2 7 V1— 2? dr — (zy 1-— z?) E 


—1+6 


Thus, 


1 
l= lim [1461-8] =2 | Vi aae. 
6—>+0 
L1 


The length of a semicircle of unit radius is denoted 7, and we thus arrive 
at the following formula 


1 
r=2 | VI às. 
zi 


This last integral is an ordinary (not generalized) Riemann integral and 
can be computed with any precision. 

If for x € [—1,1] we define arccosz as l|x, 1], then by the computations 
carried out above 
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arccos T = 


he 


or 
1 
arccoss = 2V1—a? +2 | VIZË dt. 


If we regard arc length as a primitive concept, then we must also regard 
the function x ++ arccos x introduced just now and the function x > arcsin z, 
which can be introduced similarly, as primitive. But the functions x > cos x 
and x ++ sing can then be obtained as the inverses of these functions on 
the corresponding intervals. In essence, this is what is done in elementary 
geometry. 


The example of the length of a semicircle is instructive not only because 
while studying it we made a remark on the definition of the trigonometric 
functions that may be of use to someone, but also because it naturally raises 
the question whether the number defined by formula (6.51) depends on the 
coordinate system xz, y, z and the parametrization of the curve when one is 
finding the length of a curve. 

Leaving to the reader the analysis of the role played by three-dimensional 
Cartesian coordinates, we shall examine here the role of the parametrization. 

We need to clarify that by a parametrization of a curve in RÌ, we mean 
a definition of a simple path l : I — R’ whose support is that curve. The 
point or number t € J is called a parameter and the interval J the domain of 
the parameter. 

If r: I — £ and I: I — L£ are two one-to-one mappings with the same 
set of values £, there naturally arise one-to-one mappings PA6sLei Si 
and I- o I : I — I between the domains J and I of these mappings. 

In particular, if there are two parametrizations of the same curve, then 
there is a natural correspondence between the parameters t € I and T € I 
t = t(T) or T = T(t), making it possible, knowing the parameter of a point in 
one parametrization, to find its parameter in the other parametrization. 

Let l : [a,b] > £ and F : [a, 8] — CL be two parametrizations of the 
same curve with the correspondences I (a) = I'(a) and T (b) = I'(8) between 
their initial and terminal points. Then the transition functions t = t(T) and 
T = T(t) from one parameter to another will be continuous, strictly monotonic 
mappings of the closed intervals a < t < b and a < T < @ onto each other 
with the initial points and terminal points corresponding: a + a, b + £. 

Here, if the curves I’ and I are defined by the triples T (t), y(t), z(t)) and 

(Z(t), y(t), Z(t)) of smooth functions such that |v(t)|? = 2?(t)+y?(t)+27(t) # 
0 on [a,b] and |¥(r)|*? = = (T) + i (7T) + $ (7) # 0 on [a, 8], then one can 
verify that in this case the transition functions t = t(T) and 7 = r(t) are 
smooth functions having positive derivatives on the intervals on which they 
are defined. 
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We shall not undertake to verify this assertion here; it will eventually 
be obtained as a corollary of the important implicit function theorem. At 
the moment this assertion is mostly intended as motivation for the following 
definition. 


Definition 9. The path I : [a,8] > R? is obtained from I : [a,b] > 
R3 by an admissible change of parameter if there exists a smooth mapping 
T : [a, 8] — [a,b] such that T(a) = a, T(8) = b, T'(T) > 0 on fa, 8] and 
P=foT. 


We now prove a general proposition. 


Proposition 2. If a smooth path l: [a, 6] + R? is obtained from a smooth 
path T : [a,b] + RÌ by an admissible change of parameter, then the lengths 
of the two paths are equal. 


Proof. Let I’: [a, 6] —> R? and T : [a,b] —> R® be defined respectively by the 
triples of smooth functions T ++ (&(r), G(T), Z(T)) and t + (x(t), y(t), z(t)), 
and let t = t(T) be the admissible change of parameter under which 


ITS z(t(r)) , Y(rT)= y(t(r)) z ier) z(t(7)) ; 


Using the definition (6.51) of path length, the rule for differentiating a 
composite function, and the rule for change of variable in an integral, we have 


b 
J ET at = 
; ß 
= | VEEN FPE) + PU ar = 


[EET]? + [OEE]? + ETE)? dr = 


2 


t (r) +9 (rt) +2 (r)dr. 0 


Thus, in particular, we have shown that the length of a curve is indepen- 
dent of a smooth parametrization of it. 

The length of a piecewise smooth path is defined as the sum of the lengths 
of the smooth paths into which it can be divided; for that reason it is easy to 
verify that the length of a piecewise smooth path also does not change under 
an admissible change of its parameter. 

To conclude the discussion of the concept of the length of a path and the 
length of a curve (which, after Proposition 2, we now have the right to talk 
about), we consider another example. 
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Example 4. Let us find the length of the ellipse defined by the canonical 
equation 
r? oy? 
gp |! (a>b>0). (6.56) 
Taking the parametrization x = asin Y, y = bcosy, 0 < w < 2r, we 
obtain 


27 27 
[= J (acos y)? + (—bsin Y)? dy = J \/ a? — (a? — b?) sin? pdy = 
0 0 


mw /2 nm /2 
a? — b2 . 2 . 2 
= 4a 1— 2 sin w dw = 4a 1 — k? sinf Y dy , 
0 | 0 


where k? = 1 — be is the square of the eccentricity of the ellipse. 


The integral 
p 
E(k, y) = J y1-k? sin? Y dw 
0 


cannot be expressed in elementary functions, and is called an elliptic integral 
because of the connection with the ellipse just discussed. More precisely, 
E(k, p) is the elliptic integral of second kind in the Legendre form. The value 
that it assumes for y = 7/2 depends only on k, is denoted E(k), and is called 
the complete elliptic integral of second kind. Thus E(k) = E(k, 7/2), so that 
the length of an ellipse has the form | = 4aE(k) in this notation. 


6.4.3 The Area of a Curvilinear Trapezoid 


Consider the figure aA Bb of Fig. 6.2, which is called a curvilinear trapezoid. 
This figure is bounded by the vertical line segments aA and bB, the closed 
interval [a,b] on the z-axis, and the curve AB, which is the graph of an 
integrable function y = f(z) on [a,b]. 
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Let [a, 8] be a closed interval contained in [a,b]. We denote by S(a, 8) 
the area of the curvilinear trapezoid af(a)f(@)@ corresponding to it. 
Our ideas about area are as follows: if a < œ < B < y < b, then 


S(a, y) = S(a, b) + S(8, 7). 


(additivity of areas) and 
inf f(x)(G—a) < S(a,b) < sup f(x)(B-a) 


xelap] reE[a,G] 


(The area of an enclosing figure is not less than the area of the figure en- 
closed.) 

Hence by Proposition 1, the area of this figure must be computed from 
the formula 


b 
S(a,b) = J t dz . (6.57) 


Example 5. Let us use formula (6.57) to compute the area of the ellipse given 
by the canonical equation (6.56). | 

By the symmetry of the figure and the assumed additivity of areas, it 
suffices to find the area of just the part of the ellipse in the first quadrant, 
then quadruple the result. Here are the computations: 


nm /2 
= «fal 1-5) ea? Maen tacostdt = 
nm /2 m/2 
= 4ab | cos? tdt = 2ab | (1 — cos 2) dt = nab. 
0 0 


Along the way we have made the change of variable x = asint, 0 < t < 7/2. 
Thus S = rab. In particular, when a = b = R, we obtain the formula rR? 
for the area of a disk of radius R. 


Remark. It should be noted that formula (6.57) gives the area of the curvilin- 
ear trapezoid under the condition that f(x) > 0 on [a,b]. If f is an arbitrary 
integrable function, then the integral (6.57) obviously gives the algebraic sum 
of the areas of corresponding curvilinear trapezoids lying above and below 
the x-axis. When this is done, the areas of trapezoids lying above the z-axis 
are summed with a positive sign and those below with a negative sign. 


6.4.4 Volume of a Solid of Revolution 


Now suppose the curvilinear trapezoid shown in Fig. 6.2 is revolved about 
the closed interval |a, b]. Let us determine the volume of the solid that results. 
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We denote by V (a, 8) the volume of the solid obtained by revolving the 
curvilinear trapezoid af(a)f(G)G (see Fig. 6.2) corresponding to the closed 
interval [a, 3] C [a,b]. 

According to our ideas about volume the following relations must hold: if 
ax<a<B<y7<b, then 


V(a, y) = V(a, B) + V(8,7) 


and 


2 2 
n( inf f(x) B-a) <V(@,8) <a sup f(z)) (B-a). 
r€[a,f} rE [o,f] 

In this last relation we have estimated the volume V (a, 8) by the volumes 
of inscribed and circumscribed cylinders and used the formula for the volume 
of a cylinder (which is not difficult to obtain, once the area of a disk has been 
found). 

Then by Proposition 1 


b 
VGsy=a J f(x) de . (6.58) 


Example 6. By revolving about the z-axis the semicircle bounded by the 
closed interval [—R, R] of the axis and the arc of the circle y = y R? — 2?, 
—R < x < R, one can obtain a three-dimensional ball of radius R whose 
volume is easily computed from (6.58): 


R 
Ven | (R-2?)dx = Sm RS 
—R 


More details on the measurement of lengths, areas, and volumes will be 
given in Part 2 of this course. At that time we shall solve the problem of the 
. invariance of the definitions we have given. 


6.4.5 Work and Energy 


The energy expenditure connected with the movement of a body under the 
influence of a constant force in the direction in which the force acts is mea- 
sured by the product F- S of the magnitude of the force and the magnitude 
of the displacement. This quantity is called the work done by the force in the 
displacement. In general the directions of the force and displacement may be 
noncollinear (for example, when we pull a sled by a rope), and then the work 
is defined as the inner product (F, S} of the force vector and the displacement 
vector. 

Let us consider some examples of the computation of work and the use 
of the related concept of energy. 
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Example 7. The work that must be performed against the force of gravity 
to lift a body of mass m vertically from height hı above the surface of the 
Earth to height hə is, by the definition just given, mg(h2 — hı). It is assumed 
that the entire operation occurs near the surface of the Earth, so that the 
variation of the gravitational force mg can be neglected. The general case is 
studied in Example 10. 


Example 8. Suppose we have a perfectly elastic spring, one end of which is 
attached at the point 0 of the real line, while the other is at the point z. It 
is known that the force necessary to hold this end of the spring is kx, where 
k is the modulus of the spring. 

Let us compute the work that must be done to move the free end of the 
spring from position x = a to x = b. 

Regarding the work A(a, 3) as an additive function of the interval [a, 8] 
and assuming valid the estimates 


: a Mo —a)< A(a,ß) < sup (kz)(8 -a), 


xelap] 


we arrive via Proposition 1 at the conclusion that 
b 
kx? 
A(a, b) = kz dz = 5 


a 


b 


a 


This work is done against the force. The work done by the spring during the 
same displacement differs only in sign. 

The function U(x) = ka" that we have found enables us to compute the 
work we do in changing the state of the spring, and hence the work that 
the spring must do in returning to its initial state. Such a function U(x), 
which depends only on the configuration of the system, is called the potential 
energy of the system. It is clear from the construction that the derivative of 
the potential energy gives the force of the spring with the opposite sign. 

If a point of mass m moves along the axis subject to this elastic force 
F = —kza, its coordinate x(t) as a function of time satisfies the equation 


më = —ke . (6.59) 
We have already verified once (see Subsect. 5.6.6) that the quantity 


2 2 
m+ a =KO0 GO) =z; (6.60) 
which is the sum of the kinetic and (as we now understand) potential energies 


of the system, remains constant during the motion. 


Example 9. We now consider another example In this example we shall en- 
counter a numer of concepts that we have introduced and become familiar 
with in differential and integral calculus. 
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We begin by remarking that by analogy with the function (6.60), which 
was written for a particular mechanical system satisfying Eq. (6.59), one can 
verify that for an arbitrary equation of the form 


s(t) = f(s(0) (6.61) 


where f(s) is a given function, the sum 


22 
= +U(s)=E (6.62) 
does not vary over time if U’(s) = —f(s). 
Indeed, 
dE 1ds? dU(s) .. dU ds 


a an ae a 


Thus by (6.62), regarding E as a constant, we obtain successively, first 
$= +,/2(E —U(s)) 
(where the sign must correspond to the sign of the derivative ds), then 


dt 1 


ra EE 
ds ~\/2(B—U(s)) 


and finally 


P + [Ss _— 
=T] B= U(s)) | 


Consequently, using the law of conservation of the “energy” (6.62) in Eq. 
(6.61), we have succeeded theoretically in solving this equation by finding 
not the function s(t), but its inverse t(s). 

The equation (6.61) arises, for example, in describing the motion of a 

. point along a given curve. Suppose a particle moves under the influence of 
the force of gravity along a narrow ideally smooth track (Fig. 6.3). 

Let s(t) be the distance along the track (that is, the length of the path) 
from a fixed point O — the origin of the measurement — to the point where the 
particle is at time t. It is clear that then s(t) is the magnitude of the velocity 


Fig. 6.3. 
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of the particle and §(t) is the magnitude of the tangential component of its 
acceleration, which must equal the magnitude of the tangential component 
of the force of gravity at a given point of the track. It is also clear that the 
tangential component of the force of gravity depends only on the point of the 
track, that is, it depends only on s, since s can be regarded as a parameter 
that parametrizes the curve? with which we are identifying the track. If we 
denote this component of the force of gravity by f(s), we find that 


ms = f(s). 


For this equation the following quantity will be preserved: 


1 
sms +U(s)=E, 


where U’(s) = —f(s). 

Since the term ims? is the kinetic energy of the point and the motion 
along the track is frictionless, we can guess, avoiding calculations, that the 
function U (s), up to a constant term, must have the form mgh(s), where 
mgh(s) is the potential energy of a point of height h(s) in the gravitational 
field. | 

If the relations (0) = 0, s(0) = so, and h(so) = ho held at the initial 
time t = 0, then by the relations 


2E o 
—- = § 2 = 
R +2gh(s)= C 


we find that C = 2gh(so), and therefore $? = 2g(ho — h(s)) and 


(6.63) 


f ds 
= T 


In particular if, as in the case of a pendulum, the point moves along a 
circle of radius R, the length s is measured from the lowest point O of the 
circle, and the initial conditions amount to the equality $(0) = 0 at t = 0 
and a given initial angle of displacement —ọyọ (see Fig. 6.4). Then, as one 
can verify, expressing s and h(s) in terms of the angle of displacment y, we 
obtain 


R dy 


f ds 
t= J —_- = $$ —7 
| \/2g(ho — h(s)) a \/2gR(cos w — cos Yo) 


p 
1 
Ej aoo ns 
g =. sin? om — gin? z 


or 
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0 


0 ~~ s(t) 


Fig. 6.4. 


Thus for a half-period ¿T of oscillation of the pendulum we obtain 


po 
1 1 /R d 
Pa J ee ee (6.65) 
g Be sin? oe sin? g 


sin(w/2) = 
sin(yo/2) 


n/2 
r=4/~ | -= (6.66) 
g J 1 — k? sin“ 0 
where k? = sin? £2. 


We recall that the function 


T E 
a J V1 — k? sin? 6 


is called an elliptic integral of first kind in the Legendre form. For y = 7/2 
‘it depends only on k?, is denoted K(k), and is called the complete elliptic 
integral of first kind. Thus, the period of oscillation of the pendulum is 


T = 1 [Bx | (6.67) 


If the initial displacement angle yo is small, we can set k = 0, and then 
we obtain the approximate formula 


from which, after the substitution sin 0, we find 


T x 2r 


R 
g . (6.68) 


? The parametrization of a curve by its own arc length is called its natural 
parametrization, and s is called the natural parameter. 
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Now that formula (6.66) has been obtained it is still necessary to examine 
the whole chain of reasoning. When we do, we notice that the integrands 
in the integrals (6.63)—(6.65) are unbounded functions on the interval of in- 
tegration. We encountered a similar difficulty in studying the length of a 
curve, so that we have an approximate idea of how to interpret the integrals 
(6.63)—(6.65). 

However, given that this problem has arisen for the second time, we should 
study it in a precise mathematical formulation, as will be done in the next 
section. 


Example 10. A body of mass m rises above the surface of the Earth along 
the trajectory t ++ (z(t), y(t), z(t)), where t is time, a < t < b, and z,y, z are 
the Cartesian coordinates of the point in space. It is required to compute the 
work of the body against the force of gravity during the time interval fa, b]. 

The work A(a, 8) is an additive function of the time interval [a, 3] C [a,b]. 

A constant force F acting on a body moving with constant velocity v 
performs work (F, vh) = (F, v)h in time h, and so the estimate 

inf (F(p(t)), v(t))(G—@) < A(a,B) < sup (F(p(t)), v(t))(6 — a) 

te [o,f] tE[a, 3) 
seems natural, where v(t) is the velocity of the body at time t, p(t) is the 
point in space where the body is located at time t, and F(p(t)) is the force 
acting on the body at the point p = p(t). 

If the function (F (p(t)), v(¢)) happens to be integrable, then by Propo- 
sition 1 we must conclude that 


b : 
A(a, b) = J (F(p(t)), v(t)) dé. 


In the present case v(t) = («(t), y(t), 2(t)), and if r(t) = (z(t), y(t), z(t), 
then by the law of universal gravitation, we find 


mM GmM 
F(p) = F(z,y,z) = Fa = (a2 + y2 + 23/2 EY, z) , 
where M is the mass of the Earth and its center is taken as the origin of the 
coordinate system. 
Then, 
uF v)(t) = Ga ZEEE + OE) + 21020 
(2?(t) + y(t) + 20)” 
and therefore 
b 
x’ (t) t)+ 27(t 
JEDA = =Gm u J Coro se Oh a- 
(22(t) + y2(t) + 22(¢))*/ 
oe _GmM b 
r(t)| la 


a 


GmM 
(x?(t) + y2(t) + 22(t))'/? '2 
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Thus 
GmM GmM 


Ie) le) 

We have discovered that the work we were seeking depends only on the 
magnitudes |r(a)| and |r(b)| of the distance of the body of mass m from the 
center of the Earth at the initial and final instants of time in the interval 
[a,b]. 

Setting 


A(a,b) = 


GM 
U(r) = Pa ’ 


we find that the work done against gravity in displacing the mass m from 
any point of a sphere of radius rọ to any point of a sphere of radius rı is 
computed by the formula 


Aror, = m(U (ro) = U(ri)) . 


The function U(r) is called a Newtonian potential. If we denote the radius 
of the Earth by R, then, since S = g, we can rewrite U(r) as 


_ 9R? 
U(r) = = 


Taking this into account, we can obtain the following expression for the 
work needed to escape from the Earth’s gravitational field, more precisely, to 
move a body of mass m from the surface of the Earth to an infinite distance 
from the center of the Earth. It is natural to take that quantity to be the 


limit lim ARr-. 
r= +0 


Thus the escape work is 


A = ARo = im Arr = „lim m( 2 any re 


6.4.6 Problems and Exercises 


1. Fig. 6.5 shows the graph of the dependence F = F(x) of a force acting along 
the x-axis on a test particle located at the point xz on the axis. 
a) Sketch the potential for this force in the same coordinates. 


b) Describe the potential of the force — F(x). 
c) Investigate to determine which of these two cases is such that the position xo 


of the particle is a stable equilibrium position and what property of the potential 
is involved in stability. 


2. Based on the result of Example 10, compute the velocity a body must have in 
order to escape from the gravitational field of the Earth (the escape velocity for the 
Earth). 


392 6 Integration 


Fig. 6.5. 


3. On the basis of Example 9 

a) derive the equation Rọ = gsiny for the oscillations of a mathematical pen- 
dulum; 

b) assuming the oscillations are small, obtain an approximate solution of this 
equation; 

c) from the approximate solution, determine the period of oscillation of the 
pendulum and compare the result with formula (6.68). 


4. A wheel of radius r rolls without slipping over a horizontal plane at a uniform 
velocity v. Suppose at time t = 0 the uppermost point A of the wheel has coordi- 
nates (0,2r) in a Cartesian coordinate system whose z-axis lies in the plane and is 
directed along the velocity vector. 


a) Write the law of motion t > (2), ut) of the point A. 


b) Find the velocity of A as a function of time. 

c) Describe graphically the trajectory of A. (This curve is called a cycloid.) 

d) Find the length of one arch of the cycloid (the length of one period of this 
periodic curve). 


e) The cycloid has a number of interesting properties, one of which, discovered 
by Huygens”? is that the period of oscillation of a cycloidal pendulum (a ball rolling 
in a cycloidal well) is independent of the height to which it rises above the lowest 
point of the well. Try to prove this, using Example 9. (See also Problem 6 of the 
next section, which is devoted to improper integrals.) 


5. a) Starting from Fig. 6.6, explain why, if y = f(x) and x = g(y) are a pair of 
mutually inverse continuous nonnegative functions equal to 0 at x = 0 and y = 0 
respectively, then the inequality 


T yY 
ny < f f(t) dt + J g(t) dt 
(8) (0) 
must hold. 


10 Ch. Huygens (1629-1695) — Dutch engineer, physicist, mathematician, and as- 
tronomer. l 
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Fig. 6.6. 


b) Obtain Young’s inequality 


1 1 
zy < =x? ay 

P q 
from a) for x,y > 0, p,q > 0, 5+4. 


c) What geometric meaning does equality have in the inequalities of a) and b)? 


6. The Buffon needle problem.!! The number r can be computed in the following 
rather surprising way. 

We take a large sheet of paper, ruled into parallel lines a distance h apart and 
we toss a needle of length | < h at random onto it. Suppose we have thrown the 
needle N times, and on n of those times the needle landed across one of the lines. 
If N is sufficiently large, then 7 % ah where p = x is the approximate probability 
that the needle will land across a line. 

Starting from geometric considerations connected with the computation of area, 
try to give a satisfactory explanation of this method of computing v. 


6.5 Improper Integrals 


In the preceding section we encountered the need for a somewhat broader 
concept of the Riemann integral. There, in studying a particular problem, 
we formed an idea of the direction in which this should be done. The present 
section is devoted to carrying out those ideas. 


6.5.1 Definition, Examples, and Basic Properties 
of Improper Integrals 


Definition 1. Suppose the function x + f(x) is defined on the interval 
[a, +oo| and integrable on every closed interval [a,b] contained in that inter- 
val. 


11 J.L. L. Buffon (1707-1788) — French experimental scientist. 


394 6 Integration 


The quantity 


if this limit exists, is called the improper Riemann integral or the improper 
integral of the function f over the interval [a, +00]. 


+00 
The expression f f(x) dz itself is also called an improper integral, and 


a 
in that case we say that the integral converges if the limit exists and diverges 
otherwise. Thus the question of the convergence of an improper integral is 
equivalent to the question whether the improper integral is defined or not. 


Example 1. Let us investigate the values of the parameter a for which the 
improper integral 


ae 
z 
J < (6.69) 
1 
converges, or what is the same, is defined. 
Since 5 
dr pot |, fora #1, 
a 
1 Inz|, fora=1, 
the limit 


b 
; dz 1 
lim — = 
bo+oo | x7 a—l 
1 


exists only for a > 1. 
Thus, 


Td 1 
Eh A 
goa œ&— 1 

1 


and for other values of the parameter a the integral (6.69) diverges, that is, 
is not defined. 


Definition 2. Suppose the function x +» f(x) is defined on the interval 
[a, B| and integrable on any closed interval [a,b] C [a, B|. The quantity 


B b 
J re) dz := „Jim f FC) dz , 


if this limit exists, is called the improper integral of f over the interval |a, B|. 
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The essence of this definition is that in any neighborhood of B the function 
f may happen to be unbounded. 

Similarly, if a function x +» f(x) is defined on the interval ]A,b] and 
integrable on every closed interval [a,b] C]A, b], then by definition we set 


and also by definition we set 


| f seas = tim, f rede. 


Example 2. Let us investigate the values of the parameter a for which the 


integral 
1 
dx 
J < (6.70) 
0 
converges. 
Since for a €]0, 1] 
iE ale’ ifa #1, 
a í In z| , ifa=1, 
it follows that the limit 
1 
dx 1 


exists only for a < 1. 
Thus the integral (6.70) is defined only for a < 1. 


Example 3. 
0 0 
| Ea lim fei lim (e”|°?) = lim (l—e*)=1. 
a——oo a—>—oo a a——co 


Since the question of the convergence of an improper integral is answered 
in the same way for both integrals over an infinite interval and functions un- 
bounded near one of the endpoints of a finite interval of integration, we shall 
study both of these cases together from now on, introducing the following 
basic definition. 
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Definition 3. Let [a,w| be a finite or infinite interval and xz > f(z) a 
function defined on that interval and integrable over every closed interval 
[a,b] C [a,w[. Then by definition 


[rc dz := tm f f) dz , om) 


if this limit exists as b > w, b € [a, wI. 


From now on, unless otherwise stated, when studying the improper in- 
tegral (6.71) we shall assume that the integrand satisfies the hypotheses of 
Definition 3. 

Moreover, for the sake of definiteness we shall assume that the singularity 
(“impropriety”) of the integral arises from only the upper limit of integration. 
The study of the case when it arises from the lower limit is carried out word 
for word in exactly the same way. 

From Definition 3, properties of the integral, and properties of the limit, 
one can draw the following conclusions about properties of an improper in- 
tegral. 


Proposition 1. Suppose x > f(x) andz + g(x) are functions defined on an 
interval [a,w| and integrable on every closed interval [a,b] C [a,w|. Suppose 
the improper integrals 


/ f(z) de , (6.72) 


J g(x) dx (6.73) 


are defined. 
Then a) ifw E€ R and f € Ria, w], the values of the integral (6.72) are the 


same, whether it is interpreted as a proper or an improper integral; 


b) for any Ai1,A2 E R the function (Aif + A2g)(x) is integrable in the 
improper sense on |a,w| and the following equality holds: 


W 


fous raaayae =a f s(eae+ re f ole)as: 


a 


W 


a 


c) ifc € [a,w], then 


/ f(e) de = / f() dee + / f(a) de ; 
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d) ify : [a,y[ [a,w[ is a smooth strictly monotonic mapping with 
yla) = a and y(8) > w as B > y, B € [a, yl, then the improper inte- 
gral of the function t 4 (f oy)(t)y'(t) over |a, y| exists and the following 
equality holds: 


Jro dz = fu o p)(t)p'(t)dt . 


Proof. Part a) follows from the continuity of the function 


b 
F(b) = fte) dx 


on the closed interval [a,w] on which f € Ria, w]. 
Part b) follows from the fact that for b € [a, w[ 


b 


[Ors + Mgl) dx = A; fro dz + v fa) dz. 


a 


Part c) follows from the equality 


f reae= f sears f sae 


which holds for all b,c € fa, w]. 


Part d) follows from the formula for change of variable in the definite 
integral: 


b=(8) B 
f(a)ax = [(fop)(te'(t)at. o 
a=ọ(a) a 


Remark 1. To the properties of the improper integral expressed in Propo- 
sition 1 we should add the very useful rule for integration by parts in an 
improper integral, which we give in the following formulation: 


If f,g € C™ [a, w[ and the limit lim (f -g)(x) exists, then the functions 
re [a,w| 
f-g' and g' -g are either both integrable or both nonintegrable in the improper 
sense on [a,w|, and when they are integrable the following equality holds: 


[ENED |F- 9)@)ae, 
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where 
(f-9)(x)|; = lim (f-9)(x)-(f-9)(@)- 


rE [a,w| 


Proof. This follows from the formula 


b bo 
[E Nad ED- [UF 9a) ax 


for integration by parts in a proper integral. O 


Remark 2. It is clear from part c) of Proposition 1 that the improper integrals 


f roa and hor 


either both converge or both diverge. Thus, in improper integrals, as in series, 
convergence is independent of any initial piece of the series or integral. 

For that reason, when posing the question of convergence of an improper 
integral, we sometimes omit entirely the limit of integration at which the 
integral does not have a singularity. 

With that convention the results obtained in Examples 1 and 2 can be 
rewritten as follows: 


+00 
the integral f $2 converges only for a > 1; 


the integral f dg converges only for a < 1. 
+0 


The sign +0 in the last integral shows that the region of integration is 
contained in x > 0. 
By a change of variable in this last integral, we immediately find that 


dz 


Taag converges only for a < 1. 


the integral f 
xZo+0 
6.5.2 Convergence of an Improper Integral 


a. The Cauchy Criterion By Definition 3, the convergence of the improper 
integral (6.71) is equivalent to the existence of a limit for the funciton 


b 
FW) = | Fa) dx (6.74) 


as b> w, b € [a, uw. 
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This relation is the reason why the following proposition holds. 


Proposition 2. (Cauchy criterion for convergence of an improper integral). 
If the function x +> f(x) is defined on the interval |a,w[ and integrable on 
Ww 


every closed interval [a,b] C [a,w|, then the integral f f(x) dx converges if 
and only if for every e > 0 there exists B € [a, w| Rich that the relation 
b2 
J f(x) dz| <e 
by 
holds for any bj, b2 € [a,w| satisfying B < bı and B < by. 


Proof. As a matter of fact, we have 


fro dz = [rw dz — fro dx = F(b2) — F(b1) , 
by a a 


and therefore the condition is simply the Cauchy criterion for the existence 
of a limit for the function F(b) as b > w, b € [a,w|. O 


b. Absolute Convergence of an Improper Integral 


Definition 4. The improper integral f f(x)dx converges absolutely if the 


integral f | f|(x) dx converges. 
a 


Because of the inequality 


fioa < foa 
by by 


and Proposition 2, we can conclude that if an integral converges absolutely, 
then it converges. 

The study of absolute convergence reduces to the study of convergence 
of integrals of nonnegative functions. But in this case we have the following 
proposition. 


Proposition 3. If a function f satisfies the hypotheses of Definition 3 and 
f(x) > 0 on [a,w|, then the improper integral (6.71) exists if and only if the 
function (6.74) is bounded on |a, w]. 


Proof. Indeed, if f(x) > 0 on [a,w|, then the function (6.74) is nondecreasing 
on [a,w[, and therefore it has a limit as b > w, b € [a,w|, if and only if it is 
bounded. O 7 
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As an example of the use of this proposition, we consider the following 
corollary of it. 


Corollary 1. (Integral test for convergence of a series). If the function 
x +> f(x) is defined on the interval [1,+00[, nonnegative, nonincreasing, 
and integrable on each closed interval [1,b] C [1,+00|, then the series 


S> f(n) = f(1) + f(2) 45 


and the integral 
+020 
J Foar 
1 


either both converge or both diverge. 


Proof. It follows from the hypotheses that the inequalities 


n+1 


f(n+1) < J f(a) de < f(n) 


n 


hold for any n € N. After summing these inequalities, we obtain 


k k+1 k 
Sans f f(a)dx < Y f(n) 


or 
Sk+1 — f (1) < F(k +1) < sk , 


k b 
where sk = >> f(n)and F(b) = f f(x) dz. Since sg and F(b) are nondecreas- 
n=1 1 


ing functions of their arguments, these inequalities prove the proposition. O 


In particular, one can say that the result of Example 1 is equivalent to 
the assertion that the series 


converges only for a > 1. 
The most frequently used corollary of Proposition 3 is the following the- 
orem. 


Theorem 1. (Comparison theorem). Suppose the functions x + f(x) and 
x +> g(x) are defined on the interval [a,w| and integrable on any closed 
interval [a,b] C [a, w]. 
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If 
0 < f(z) < g(z) 


on [a,w|, then convergence of the integral (6.73) implies convergence of (6.72) 


and the inequality 
| tæ dz < fs) dx 


a 


holds. Divergence of the integral (6.72) implies divergence of (6.73). 


Proof. From the hypotheses of the theorem and the inequalities for proper 
Riemann integrals we have 


b 


b 
Fo) = | fa)ae< f g(x) ax = 900 


a 


for any b € [a,w|. Since both functions F and G are nondecreasing on |a, wf, 
the theorem follows from this inequality and Proposition 3. O 


Remark 3. If instead of satisfying the inequalities 0 < f(x) < g(x) the func- 
tions f and g in the theorem are known to be nonnegative and of the same 
order as x > w, x € [a,w|, that is, there are positive constants cı and cz such 
that 

ci f(x) < g(x) < c2f (2) , 


then by the linearity of the improper integral and theorem just proved, in 
this case we can conclude that the integrals (6.72) and (6.73) either both 
converge or both diverge. 


Example 4. The integral 


converges, since 


Vitat 23/? 

as © — +00. 
Example 5. The integral 

+00 

COS £ 
J z dz 
x 

1 
converges absolutely, since 

ose | j 1 


r2 
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for x > 1. Consequently, 


+00 


[Ga < Jles IE ee 
1 

+oo 

fev 

1 


$ . En 2 — 
converges, since e77? < e`? for x > 1 and 


Example 6. The integral 


+oo +00 
2 1 
= de< f ot dx == 
e 
1 1 


Example 7. The integral 


+00 
J2 dr 
In 
diverges, since | 
1 1 
x 
for sufficiently large values of x. 


Example 8. The Euler integral 
nm [2 
J ln sin z dz 
0 


converges, since 


|Insinz| ~ | ln g| < 


al 


as x — +0. 


Example 9. The elliptic integral 


1 
J dz 
J JO 2) — Pa?) 
converges for 0 < k? < 1, since | 


(1 — 2?)(1 — k?z?) ~ \/2(1 — k?)(1 — z)? 


as xr 1-0. 
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Example 10. The integral 


(a) 
| -= -r — cosy 
0 


converges, since 


0 — 0 
,/cos 0 — cosy = 4/ 2 sin as sin = ~ /sin olọ — 0)! 


as 0 > y — 0. 


Example 11. The integral 


L PO d 
Pa aj= | -== (6.75) 
g A sin? 2- sin? g 


converges for 0 < po < 7 since as Y — Yo — 0 we have 


sin? 2 — sin? = ~ ysin polyo — Y)? . (6.76) 


Relation (6.75) expresses the dependence of the period of oscillations of a 
pendulum on its length L and its initial angle of displacement, measured from 
the radius it occupies at the lowest point of its trajectory. Formula (6.75) is 
an elementary version of formula (6.65) of the preceding section. 

A pendulum can be thought of, for example, as consisting of a weightless 
rod, one end of which is attached by a hinge while the other end, to which a 
point mass is attached, is free. 

In that case one can speak of arbitrary initial angles yo € [0,7]. For 
po = 0 and Yo = 7, the pendulum will not oscillate at all, being in a state of 
stable equilibrium in the first case and unstable equilibrium in the second. 

It is interesting to note that (6.75) and (6.76) easily imply that T — oo as 
Po — T — 0, that is, the period of.oscillation of a pendulum increases without 
bound as its initial position approaches the upper (unstable) equilibrium 
position. 


c. Conditional Convergence of an Improper Integral 


Definition 5. If an improper integral converges but not absolutely, we say 
that it converges conditionally. 


Example 12. Using Remark 1, by the formula for integration by parts in an 
improper integral, we find that 


+00 +00 +00 


sin x cos x |+°° COS £ COS £ 
dz = — — z dz = — a dz , 
T T \in/2 T £ 


m/2 n/2 x /2 
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provided the last integral converges. But, as we saw in Example 5, this integral 
converges, and hence the integral 


+00 


J —— da (6.77) 


a /2 


also converges. 
At the same time, the integral (6.77) is not absolutely convergent. Indeed, 
for b € [1/2, +00| we have 


b 


b b b 
: . 2 
i fae. i 2 
J |5 | Su 2 iS Sa. (6.78) 
T/2 T/2 x /2 m/2 


The integral 
+00 


2 
[= Li 
T 

x /2 


as can be verified through integration by parts, is convergent, so that as 
b — +00, the difference on the right-hand side of relation (6.78) tends to +-oo. 
Thus, by estimate (6.78), the integral (6.77) is not absolutely convergent. 


We now give a special convergence test for improper integrals based on 
the second mean-value theorem and hence essentially on the same formula 
for integration by parts. 


Proposition 4. (Abel—Dirichlet test for convergence of an integral). Let x > 
f(x) and x +> g(x) be functions defined on an interval {a,w| and integrable 
on every closed interval [a,b] C [a,w[. Suppose that g is monotonic. 

Then a sufficient condition for convergence of the improper integral 


fE e)an (6.79) 


is that the one of the following pairs of conditions hold: 
a1) the integral | f(x)dx converges, 


(1) the function g is bounded on |a, w], 


or 
b 
a2) the function F(b) = f f(x)dx is bounded on [a, w], 


a 
b2) the function g(x) tends to zero as x > w, x E [a, w]. 
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Proof. For any bı and bz in [a, w| we have, by the second mean-value theorem, 


bo £ bo 
J (f - 9)(£) dx = g(b1) J EET / f(a)de, 
bi bı E 


where £ is a point lying between bı and b2. Hence by the Cauchy convergence 
criterion (Proposition 2), we conclude that the integral (6.79) does indeed 
converge if either of the two pairs of conditions holds. O 


6.5.3 Improper Integrals with More than one Singularity 


Up to now we have spoken only of improper integrals with one singularity 
caused either by the unboundedness of the function at one of the endpoints 
of the interval of integration or by an infinite limit of integration. In this 
subsection we shall show in what sense other possible variants of an improper 
integral can be taken. 

If both limits of integration are singularities of either of these two types, 
then by definition 


ite dits Jro dz + fro dz , (6.80) 


where c is an arbitrary point of the open interval ]w1, wel. 

It is assumed here that each of the improper integrals on the right-hand 
side of (6.80) converges. Otherwise we say that the integral on the left-hand 
side of (6.80) diverges. 

By Remark 2 and the additive property of the improper integral, the 
definition (6.80) is unambiguous in the sense that it is independent of the 
choice of the point c €]wy, wə. 


Example 13. 


1 0 1 
[= =) dx + | dz 
AA) A) ie 


it of Latent _ 41 
= arcsin s| + arcsin z|; = arcsin g|; =T. 
Example 14. The integral 
+00 
2 
J e” dz 
— 00O 


is called the Euler-Poisson integral, and sometimes the Gaussian integral. It 
obviously converges in the sense given above. It will be shown later that its 
value is \/7. 
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Example 15. The integral 


rd var 
J a and J = 

ge TO 
0 1 


diverges. 


Example 16. The integral 


converges if each of the integrals 


1 e +00 e 
sin x sin x 
J dz and J dx 
Fo ee 
0 1 


converges. The first of these integrals converges if a < 2, since 


sin x 1 
NI 


re ga-l 


as x —> +0. The second integral converges if a > 0, as one can verify directly 
through an integration by parts similar to the one shown in Example 12, or 
by citing the Abel—Dirichlet test. Thus the original integral has a meaning 


for 0 <a < 2. 


In the case when the integrand is not bounded in a neighborhood of one 
of the interior points w of the closed interval of integration [a,b], we set 


f sees = | pears f pyar, 


requiring that both of the integrals on the right-hand side exist. 


Example 17. In the sense of the convention (6.81) 


E 4 
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1 | 
Example 18. The integral J dz is not defined. 
Besides (6.81), there is a second convention about computing the integral 
of a function that is unbounded in a neighborhood of an interior point w of 


a closed interval of integration. To be specific, we set 


PV fro di: = jim, ( T tw dx + | f(a) ar) (6.82) 


w+d 


if the integral on the right-hand side exists. This limit is called, following 
Cauchy, the principal value of the integral, and, to distinguish the definitions 
(6.81) and (6.82), we put the letters PV in front of the second to indicate 
that it is the principal value. | 

In accordance with this convention we have 


Example 19. 


We also adopt the following definition: 


+00 R 
PV / f(x) dz := lim J t) dz. (6.83) 
R-+00 
—oo —R 
Example 20. 
+00 
PV J tdr =(0:; 


Finally, if there are several (finitely many) singularities of one kind or 
another on the interval of integration, at interior points or endpoints, then 
the nonsingular points of the interval are divided into a finite number of such 
intervals, each containing only one singularity, and the integral is computed 
as the sum of the integrals over the closed intervals of the partition. 

It can be verified that the result of such a computation is not affected by 
the arbitrariness in the choice of a partition. 


Example 21. The precise definition of the logarithmic integral can now be 
written as 
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I< if0<a¢<1l, 


T 
PV a 1k Ae, 
Int 
0 
In the last case the symbol PV refers to the only interior singularity on 
the interval ]0, z|, which is located at 1. We remark that in the sense of the 


definition in formula (6.81) this integral is not convergent. 


6.5.4 Problems and Exercises 


1. Show that the following functions have the stated properties. 


T 


a) Si (x) = f #24 dt (the sine integral) is defined on all of R, is an odd function, 


and has a limit as z > +co. 


b) si(x) = — f $= dt is defined on all of R and differs from Siz only by a 
constant; i 
(oe) 
c) Cix = — f Sdt (the cosine integral) can be computed for sufficiently large 


x 
sin x 


values of x by the approximate formula Cix ~ *2*; estimate the region of values 
where the absolute error of this approximation is less than 1074. 


2. Show that 
+00 +00 
a) the integrals f 23% dz, f 8" da converge only for a > 0, and absolutely 
1 1 


only for a > 1; 
b) the Fresnel integrals 


1 z 1 a 
C(r)= Vi [ cost? dt, S(2)= Z [sine dt 
0 0 


are infinitely differentiable functions on the interval ]0, +oo[, and both have a limit 
as © —> +00. 


3. Show that 
a) the elliptic integral of first kind 


sin 


F(k, p) = J = 


(1 — t2)(1 — k#2) 
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is defined for 0 < k < 1, 0 < p < § and can be brought into the form 


(a) 
d 
F(k, p) = f —— 
J 1 — k? sin* p 
b) the complete elliptic integral of first kind 


nm /2 


K(k) = ae 


\/1 — k? sin? w 


increases without bound as k > 1 — 0. 


4. Show that 

a) the exponential integral Ei (x) = Í a dt is defined and infinitely differen- 
tiable for x < 0; - 

b) —Ei(—2z) = (1 —~243-.--4+(-1)"S +o(2)) as © —> +00; 


(oe) 
c) the series }> (—1)”% does not converge for any value of z € R.; 


n= 


d) li (x) ~ ~ as x — +0. (For the definition of the logarithmic integral li (x) 


ln x 


see Example 21.) 
5. Show that 
a) the function S(x) = z f e`t" dt, called the error function and often de- 


noted erf (x), is defined, even, and infinitely differentiable on R and has a limit as 
E> OO; 
b) if the limit in a) is equal to 1 (and it is), then 


sili at Be Sod BE 
ft 7 Vn 2a 2a? 28° 2427 gt 
0 


as £z — +00. 


6. Prove the following statements. 

a) If a heavy particle slides under gravitational attraction along a curve given 
in parametric form as x = x(@), y = y(@), and at time t = 0 the particle had zero 
velocity and was located at the point ro = z(o), yo = y(9o), then the following 
relation holds between the parameter @ defining a point on the curve and the time 
t at which the particle passes this point (see formula (6.63) of Sect. 6.4) 


eo) +o) 
2g (vo = y(0)) 


in which the improper integral necessarily converges if y’(00) # 0. (The ambiguous 
sign is chosen positive or negative according as t and 0 have the same kind of 
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monotonicity or the opposite kind; that is, if an increasing @ corresponds to an 
increasing t, then one must obviously choose the positive sign.) 


b) The period of oscillation of a particle in a well having cross section in the 
shape of a cycloid 
x = R(O+7+sin8) , 
al <T, 
y = —R(1+cos8) , 


is independent of the level yo = — R(1 + cos ĝo) from which it begins to slide and is 
equal to 47,/R/g (see Problem 4 of Sect. 6.4). 


7 Functions of Several Variables: 
their Limits and Continuity 


Up to now we have considered almost exclusively numerical-valued functions 
x ++ f(x) in which the number f(x) was determined by giving a single number 
x from the domain of definition of the function. 

However, many quantities of interest depend on not just one, but many 
factors, and if the quantity itself and each of the factors that determine it 
can be characterized by some number, then this dependence reduces to the 
fact that a value y = f(z',...,2”) of the quantity in question is made to 
correspond to an ordered set (x',..., x”) of numbers, each of which describes 
the state of the corresponding factor. The quantity assumes this value when 
the factors determining this quantity are in these states. 

For example, the area of a rectangle is the product of the lengths of its 
sides. The volume of a given quantity of gas is computed by the formula 


ven, 
P 


where R is a constant, m is the mass, T is the absolute temperature, and p 
is the pressure of the gas. Thus the value of V depends on a variable ordered 
triple of numbers (m, T, p), or, as we say, V is a function of the three variables 
m, T, and p. 

Our goal is to learn how to study functions of several variables just as we 
learned how to study functions of one variable. 

As in the case of functions of one variable, the study of functions of several 
numerical variables begins by describing their domains of definition. 


7.1 The Space R™ and the Most Important Classes 
of its Subsets 


7.1.1 The Set R™ and the Distance in it 


We make the convention that R™ denotes the set of ordered m-tuples 
(x!,...,£™) of real numbers zt € R, (i = 1,...,m). 

Each such m-tuple will be denoted by a single letter x = (z',...,2™) 
and, in accordance with convenient geometric terminology, will be called a 
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point of R™. The number zê in the set (xt,...,x™) will be called the ith 


coordinate of the point x = (x!,...,£™). 
The geometric analogies can be extended by introducing a distance on 
R” between the points xı = (x},...,27) and z2 = (xd,..., 23") according 


to the formula 


m 


d(x1,22) = ,| > (ai — 24)? . (7.1) 


i=1 


The function 
d: R” x R” >R 


defined by the formula (7.1) obviously has the following properties: 


a) d(x1, £2) > 0; 

b) (d(a1, 22) = 0) © (z1 = 22); 

c) d(x, £2) = d(x2,21); 

d) d(x1, x3) < d(x1, 22) + d(x2, £3). 


This last inequality (called, again because of geometric analogies, the 
triangle inequality) is a special case of Minkowski’s inequality (see Subsect. 
5.4.2). 

A function defined on pairs of points (21,22) of a set X and possessing 
the properties a), b), c), and d) is called a metric or distance on X. 

A set X together with a fixed metric on it is called a metric space. 

Thus we have turned R™ into a metric space by endowing it with the 
metric given by relation (7.1). 

The reader can get information on arbitrary metric spaces in Chapter 
9 (Part 2). Here we do not wish to become distracted from the particular 
metric space R™ that we need at the moment. 

Since the space R™ with metric (7.1) will be our only metric space in 
this chapter, forming our object of study, we have no need for the general 
definition of a metric space at the moment. It is given only to explain the 
term “space” used in relation to R™ and the term “metric” in relation to the 
function (7.1). 

It follows from (7.1) that for i € {1,...,m} 


lay — 25] < d(z1, 22) < Vm max |x} — 25], (7.2) 


that is, the distance between the points x1,x2 E€ R™ is small if and only if 
the corresponding coordinates of these points are close together. 

It is clear from (7.2) and also from (7.1) that for m = 1, the set R! is 
the same as the set of real numbers, between whose points the distance is 
measured in the standard way by the absolute value of the difference of the 
numbers. 
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7.1.2 Open and Closed Sets in R” 
Definition 1. For 6 > 0 the set 
B(a;6) = {x € R™|d(a,z) < ô} 


is called the ball with center a € R™ of radius 6 or the 6-neighborhood of the 
point a € R”. 


Definition 2. A set G C R” is open in R” if for every point x € G there is 
a ball B(x; 6) such that B(x; ô) C G. 


Example 1. R™ is an open set in R”. 


Example 2. The empty set @ contains no points at all and hence may be 
regarded as satisfying Definition 2, that is, Ø is an open set in R”. 


Example 3. A ball B(a;r) is an open set in R”. Indeed, if x € B(a;r), that 
is, d(a, x) < r, then for 0 < ô < r — d(a, x), we have B(x;6) C B(a;r), since 


(€ € B(x;6)) > (d(x, £) < ô) = 
= (d(a,£) < d(a, x) + d(z, €) < d(a, x£) +r — d(a, z) = r). 


Example 4. A set G = {x € R™|d(a, x) > r}, that is, the set of points whose 
distance from a fixed point a € R™ is larger than r, is open. This fact is easy 
to verify, as in Example 3, using the triangle inequality for the metric. 


Definition 3. The set F C R” is closed in R” if its complement G = R”\F 
is open in R”. 


Example 5. The set B(a;r) = {x € R™|d(a,z) < r}, r > 0, that is, the set 
of points whose distance from a fixed point a € R™ is at most r, is closed, as 
follows from Definition 3 and Example 4. The set B(a;r) is called the closed 
_ ball with center a of radius r. 


Proposition 1. a) The union |) Ga of the sets of any system {Ga, a € A} 
acA 
of open sets in R™ is an open set in R™. 


n 
b) The intersection (| G; of a finite number of open sets in R™ is an 
i=1 
open set in R™. 
a’) The intersection (| Fa of the sets of any system {Fy, a € A} of 
acA 
closed sets Fẹ in R™ is a closed set in R™. 
n 


b') The union |) F; of a finite number of closed sets in R™ is a closed 


1=1 
set in R™. 
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Proof. a) If x € (J Ga, then there exists ag € A such that x € Ga, and 
aCA 
consequently there is a ô-neighborhood B(x; 6) of x such that B(x; ô) C Gay. 


But then B(z;d) c U Ga. 
acA 
b) Let x € () G;. Then z € G;, (i = 1,...,n). Let 6),...,6, be positive 
i=1 
numbers such that B(x; 6;) C Gj, (i = 1,..., n). Setting 6 = min{ô1,..., dn}, 


we obviously find that 6 > 0 and B(x; ô) C N Gi. 
i=1 


a’) Let us show that the set C ( N Fa) complementary to (] Fa in R” 
acA acA 
is an open set in R”. 


Indeed, 
C( Na = UCr)= U Ga, 


acA acA acA 


where the sets Ga = CF, are open in R”. Part a’) now follows from a). 
b’) Similarly, from b) we obtain 


c(UR) -(\\cr)=(\G.0 
i=1 i=1 i=1 


Example 6. The set S(a;r) = {x € R”|d(a,x) = r}, r > 0, is called the 
sphere of radius r with center a € R™. The complement of S(a;r) in R”, by 
Examples 3 and 4, is the union of open sets. Hence by the proposition just 
proved it is open, and the sphere S(a;r) is closed in R”. 


Definition 4. An open set in R” containing a given point is called a neigh- 
borhood of that point in R”. 


In particular, as follows from Example 3, the -neighborhood of a point 
is a neighborhood of it. 


Definition 5. In relation to a set E C R” a point x € R” is 
an interior point if some neighborhood of it is contained in E; 
an exterior point if it is an interior point of the complement of E in R”; 
a boundary point if it is neither an interior point nor an exterior point. 


It follows from this definition that the characteristic property of a bound- 
ary point of a set is that every neighborhood of it contains both points of the 
set and points not in the set. 


Example 7. The sphere S(a;r), r > 0 is the set of boundary points of both 
the open ball B(a;r) and the closed ball B(a;r). 


Example 8. A point a € R” is a boundary point of the set R™ \ a, which has 
no exterior points. 
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Example 9. All points of the sphere S(a;r) are boundary points of it; re- 
garded as a subset of R™, the sphere S(a;r) has no interior points. 


Definition 6. A point a € R” is a limit point of the set E cC R” if for any 
neighborhood O(a) of a the intersection E N O(a) is an infinite set. 


Definition 7. The union of a set E and all its limit points in R™ is the 
closure of E in R™. 


The closure of the set E is usually denoted F. 


Example 10. The set B(a;r) = B(a;r) U S(a;r) is the set of limit points of 
the open ball B(a;r); that is why B(a;r), in contrast to B(a;r), is called a 
closed ball. 


Example 11. S(a;r) = S(a;r). 


Rather than proving this last equality, we shall prove the following useful 
proposition. 


Proposition 2. (F is closed in R™) + (F = F in R”). 


In other words, F is closed in R” if and only if it contains all its limit 
points. 


Proof. Let F be closed in R”, x € R”, and x ¢ F. Then the open set 
G = R” \ F is a neighborhood of x that contains no points of F. Thus we 
have shown that if x ¢ F, then z is not a limit point of F. 

Let F = F. We shall verify that the set G = R™ \ F is open in R”. If 
x E€ G, then z ¢ F, and therefore x is not a limit point of F. Hence there is a 
neighborhood of x containing only a finite number of points 71,..., £n of F. 
Since x ¢ F, one can construct, for example, balls about x, O,(x),...,On(z) 


such that x; ¢ O;(z). Then O(z) = ia O;(xz) is an open e of x 


_ containing no points of F at all, that 5 O(x) c R” \ F and hence the set 
R” \ F = R” \ F is open. Therefore F is closed in R”. O 


7.1.3 Compact Sets in R” 


Definition 8. A set K C R™ is compact if from every covering of K by sets 
that are open in R” one can extract a finite covering. 


Example 12. A closed interval [a,b] C R! is compact by the finite covering 
lemma, (Heine—Borel theorem). 


Example 13. A generalization to R™ of the concept of a closed interval is the 


set . . . 
Tre R Ha sar ab tE kham 
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which is called an m-dimensional interval, or an m-dimensional block or an 
m-dimensional parallelepiped. 
We shall show that J is compact in R”. 


Proof. Assume that from some open covering of J one cannot extract a finite 
covering. Bisecting each of the coordinate closed intervals J‘ = {xt € R: a’ < 
a’ < b}, (i = 1,...,m), we break the interval I into 2” intervals, at least 
one of which does not admit a covering by a finite number of sets from the 
open system we started with. We proceed with this interval exactly as with 
the original interval. Continuing this division process, we obtain a sequence 


of nested intervals J = h D h D- D In D ---, none of which admits 
a finite covering. If I, = {x € R™|a, < xt < b}, %,...,m}, then for each 
i € {1,...,m} the coordinate closed intervals a’, : ‘a! < b G1, 2,02) 


form, by construction: a system of nested closed intervals whose lengths tend 
to zero. By finding the point €* € [a?,,b',] common to all of these intervals 
for each i € {1,...,m}, we obtain a point € = (€',...,€™) belonging to all 
the intervals I = L.I, ...,In,.... Since € € I, there is an open set G in the 
system of covering sets such that € € G. Then for some 6 > 0 we also have 
B(€;6) c G. But by construction and the relation (7.2) there exists N such 
that In C B(€;6) C G for n > N. We have now reached a contradiction with 
the fact that the intervals [, do not admit a finite covering by sets of the 


given system. O 


Proposition 3. If K is a compact set in R™, then 
a) K is closed in R™; 


b) any closed subset of R™ contained in K is itself compact. 


Proof. a) We shall show that any point a € R™ that is a limit point of 
K must belong to K. Suppose a ¢ K. For each point x € K we con- 
struct a neighborhood G(x) such that a has a neighborhood disjoint from 
G(x). The set {G(x)}, x € K, consisting of all such neighborhoods forms 
an open covering of the compact set K, from which we can select a finite 
covering G(zj),...,G(an). If now O,(a) is a neigborhood of a such that 


G(x) MO;(a) = Ø, then the set O(a) = ia O;(a) is also a neighborhood of 


a, and obviously K N O(a) = Ø. Thus a caine be a limit point of K. 

b) Suppose F is a closed subset of R™ and F C K. Let {Ga}, a € A, be 
a covering of F by sets that are open in R™. Adjoining to this collection the 
open set G = R” \ F, we obtain an open covering of R”, and in particular, 
an open covering of K, from which we select a finite covering of K. This finite 
covering of K will also cover the set F. Observing that GM F = Ø, one can 
say that if G belongs to this finite covering, we will still have a finite covering 
of F by sets of the original system {Ga}, a € A, if we remove G. O 
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Definition 9. The diameter of a set E C R™ is the quantity 


d(E):= sup d(x 1,22). 


X1,LQE 
Definition 10. A set E C R” is bounded if its diameter is finite. 


Proposition 4. If K is a compact set in R™, then K is a bounded subset of 
R”. 

Proof. Take an arbitrary point a € R™ and consider the sequence of open 
balls {B(a;n)}, (n = 1,2,...,). They form an open covering of R™ and 


consequently also of K. If K were not bounded, it would be impossible to 
select a finite covering of K from this sequence. O 


Proposition 5. The set K C R™ is compact if and only if K is closed and 
bounded in R™. 


Proof. The necessity of these conditions was proved in Propositions 3 and 4. 

Let us verify that the conditions are sufficient. Since K is a bounded 
set, there exists an m-dimensional interval J containing K. As was shown in 
Example 13, I is compact in R™. But if K is a closed set contained in the 
compact set J, then by Proposition 3b) it is itself compact. O 


7.1.4 Problems and Exercises 


1. The distance d(Ei, E2) between the sets E1, E2 C R” is the quantity 


d( Ei, E2) — inf d(x1, £2) . 


xı EE1, t2EE2 


Give an example of closed sets E and E2 in R” having no points in common for 
which d(E1, E2) = 0. 
2. Show that 
| a) the closure EF in R” of any set E C R™ is a closed set in R”; 
b) the set OF of boundary points of any set E C R™ is a closed set; 
c) if G is an open set in R” and F is closed in R”, then G \ F is open in R”. 


3. Show that if Ki D Ke D-:: D Kn D --- is a sequence of nested nonempty 
(09) 

compact sets, then () K: 4 Ø. 
ee 


l 


4. a) In the space R* a two-dimensional sphere S and a circle S? are situated so 
that the distance from any point of the sphere to any point of the circle is the same. 
Is this possible? 


b) Consider problem a) for spheres S™, S” of arbitrary dimension in R*. Under 
what relation on m, n, and k is this situation possible? 
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7.2 Limits and Continuity of Functions 
of Several Variables 


7.2.1 The Limit of a Function 


In Chapter 3 we studied in detail the operation of passing to the limit for a 
real-valued function f : X — R defined on a set in which a base B was fixed. 
In the next few sections we shall be considering functions f : X — R” 
defined on subsets of R™ with values in R = R! or more generally in R”, 
on € N. We shall now make a number of additions to the theory of limits 
connected with the specifics of this class of functions. 
However, we begin with the basic general definition. 


Definition 1. A point A € R” is the limit of the mapping f : X — R” over 
a base B in X if for every neighborhood V(A) of the point there exists an 
element B € B of the base whose image f(B) is contained in V(A). 


In brief, 
(tim f(z) = A) := (WV(A) 3B € B (f(B) c V(A))) . 


We see that the definition of the limit of a function f : X — R” is exactly 
the same as the definition of the limit of a function f : X — R if we keep in 
mind what a neighborhood V(A) of a point A € R” is for every n € N. 


Definition 2. A mapping f : X — R” is bounded if the set f(X) c R” is 
bounded in R”. 


Definition 3. Let B be a base in X. A mapping f : X — R” is ultimately 
bounded over the base B if there exists an element B of B on which f is 
bounded. 


Taking these definitions into account and using the same reasoning that 
we gave in Chapt. 3, one can verify without difficulty that 

a function f : X — R” can have at most one limit over a given base B in 
A 

a function f : X —> R” having a limit over a base B is ultimately bounded 
over that base. 

Definition 1 can be rewritten in another form making explicit use of the 
metric in R”, namely 


Definition 1’. 
(lim f(a) =AeE R”) = (Ve > 0 3B € B Yx € B (d( f(z), A) < €)) 


or 
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Definition 1”. 

(y =A R”) := (i A) J 

( im f(x) € im (d( f(z), A) =0 


The specific property of a mapping f : X — R” is that, since a point 
y € R” is an ordered n-tuple (y',...,y”) of real numbers, defining a function 
f: X — R” is equivalent to defining n real-valued functions ft : X —> R 
(i=1,..., n), where f(x) =y (i =1,...,n). 

If A = (A?,...,A”) and y = (y',...,y”), we have the inequalities 


ly’ — A'| < d(y, A) < Vn max |y- 4’), (7.3) 


from which one can see that 


lim f(z) =AS lim f*(2) SAY (G=), (7.4) 


that is, convergence in R” is coordinatewise. 

Now let X = N be the set of natural numbers and B the base k — ov, 
k € N, in X. A function f : N > R” in this case is a sequence {yx}, k € N, 
of points of R”. 


Definition 4. A sequence {yx}, k € N, of points y, € R” is fundamental (a 
Cauchy sequence) if for every £ > 0 there exists a number N € N such that 
d(Ykı, Ykz) < € for all ky, k2 > N. 


One can conclude from the inequalities (7.3) that a sequence of points 
Yk = (Yg,---, yz) € R” is a Cauchy sequence if and only if each sequence of 
coordinates having the same labels {yi}, k € N, i = 1,...,n, is a Cauchy 
sequence. 

Taking account of relation (7.4) and the Cauchy criterion for numerical 
sequences, one can now assert that a sequence of points R” converges if and 
only if it is a Cauchy sequence. 

In other words, the Cauchy criterion is also valid in R”. 

Later on we shall call metric spaces in which every Cauchy sequence has 
a limit complete metric spaces. Thus we have now established that R” is a 
complete metric space for every n € N. 


Definition 5. The oscillation of a function f : X — R” on a set E C X is 
the quantity 

w(f; E) := d(f(E)) ' 
where d(f(£)) is the diameter of f(E). 


As one can see, this is a direct generalization of the definition of the 
oscillation of a real-valued function, which Definition 5 becomes when n = 1. 

The validity of the following Cauchy criterion for the existence of a limit 
for functions f : X — R” with values in R” results from the completeness of 
Re: 
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Theorem 1. Let X be a set and B a base in X. A function f : X — R” has 
a limit over the base B if and only if for every € > 0 there exists an element 
B € B of the base on which the oscillation of the function is less than €. 
Thus, 
dlim f(x) + Ve > 0 IB € B (w(f;B) <€). 


The proof of Theorem 1 is a verbatim repetition of the proof of the Cauchy 
criterion for numerical functions (Theorem 4 in Sect. 3.2), except for one mi- 
nor change: | f(x1) — f(z2)| must be replaced throughout by d(f(z1), f(x2)). 

One can also verify Theorem 1 another way, regarding the Cauchy crite- 
rion as known for real-valued functions and using relations (7.4) and (7.3). 

The important theorem on the limit of a composite function also remains 
valid for functions with values in R”. 


Theorem 2. Let Y be a set, By a base in Y, andg: Y — R” a mapping 
having a limit over the base By. 

Let X be a set, Bx a base in X, and f : X — Y a mapping of X into 
Y such that for each By € By there exists Bx € Bx such that the image 
f(Bx) is contained in By. 

Under these conditions the composition go f : X — R” of the mappings 
f and g is defined and has a limit over the base Bx, and 


lim(g o f)(x) = lim g(y) . 


The proof of Theorem 2 can be carried out either by repeating the proof 
of Theorem 5 of Sect. 3.2, replacing R by R”, or by invoking that theorem 
and using relation (7.4). 

Up to now we have considered functions f : X — R” with values in R”, 
without specifying their domains of definition X in any way. From now on 
we shall primarily be interested in the case when X is a subset of R”. 

As before, we make the following conventions. : 


U(a) is a neighborhood of the point a € R”; 


U (a) is a deleted neighborhood of a € R”, that is, U (a) := U (a) \ a; 

Upg(a) is a neighborhood of a in the set E C R”, that is, Ug(a) := 
ENU(a); 

Ugla) is a deleted neighborhood of a in E, that is, Ugla) = ENU (a); 

x — a is the base of deleted neighborhoods of a in R™; 

x — oo is the base of neighborhoods of infinity, that is, the base consisting 
of the sets R” \ B(a;r); 

z—a,xz€ E or (£52 — a) is the base of deleted neighborhoods of a 
in E if a is a limit point of E; 

xz > œ, x E€ E or (E 3 x > ov) is the base of neighborhoods of infinity 
in E consisting of the sets E \ B(a;r), if E is an unbounded set. 
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In accordance with these definitions, one can, for example, give the fol- 
lowing specific form of Definition 1 for the limit of a function when speaking 
of a function f : E > R” mapping a set E C R™ into R”: 


(jim f(x) = A) := (We > 0 3Ug(a) Yz € Ugla) (d(f(z), A) < €)) . 
The same thing can be written another way: 
(lim f(z) = A) = 
= (Ve > 0 55 > 0 Vx € E (0 < d(z,a) < ô > d(f(x), A) < €)) . 


Here it is understood that the distances d(x,a) and d( f(x), A) are measured 
in the spaces (R™ and R”) in which these points lie. 
Finally, 


( lim f(x) = A) := (Ve > O5B(a;r) Vz € R” \ B(a;r) (d(f (£), A) <€). 


Let us also agree to say that, in the case of a mapping f : X — R”, the 
phrase “f(x) — oo in the base B” means that for any ball B(A;r) c R” 
there exists B € B of the base B such that f(B) c R” \ B(A;r). 


Example 1. Let x ++ n’ (x) be the mapping 7* : R”™ — R assigning to each 


x =(z',...,2™) in R” its ith coordinate zt. Thus 
n(x) =2'. 
If a = (a',...,a™), then obviously 


(2) ab asr>a. 


The function x +> 7*(x) does not tend to any finite value nor to infinity 
asx—ooifm> 1. 
On the other hand, 


f(z) = ` (xr*(x))° + coasxr—- oo. 


i=1 


One should not think that the limit of a function of several variables can 
be found by computing successively the limits with respect to each of its 
coordinates. The following examples show why this is not the case. 


Example 2. Let the function f : R? — R be defined at the point (x,y) € R? 


as follows: 
aa if x? +y? 40, 
f(x,y) = 
0, ifr? +y =0. 


Then f(0,y) = f(z,0) = 0, while f(x, x) = § for x £0. 
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Hence this function has no limit as (x,y) — (0,0). 
On the other hand, 


lim ( lim f(x,y)) = lim ı (0) = E 


y—0 


lim (lim f(x,y)) = lim 1(0) =0. 


x—0 


Example 3. For the function 


Zo ife? +y? 40, 


f(z,y) = 
0, ifa%+y?=0, 


we have 
£ 
lim (lim faau) = lim (5) =1 = 


lim (lim f(x, y)) = lim 1 ( h sel, 


y>0 ‘x0 
Example 4. For the function 


z + ysin + ,ifx0, 
f(x,y) = 
0, I=, 
we have 
lim f(z 
(2,9) +(0, 0) GU) 
lim (Jim f(z,y)) =0, 


x—0 


yet at the same time the iterated limit 


lim (lim F(z, y)) 


y0 `r—> 
does not exist at all. 
Example 5. The function 
A ifr +y £0, 
f(z, y) = 
0, ifa%+y?=0, 


has a limit of zero upon approach to the origin along any ray x = at, y = bt. 
At the same time, the function equals ł at any point of the form (a, a”), 
where a # 0, and so the function has no limit as (x,y) > (0,0). 
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7.2.2 Continuity of a Function of Several Variables 
and Properties of Continuous Functions 


Let E be a subset of R™ and f : E — R” a function defined on E with values 
in R”. 


Definition 6. The function f : E — R” is continuous at a € E if for every 
neighborhood V (f(a)) of the value f(a) that the function assumes at a, there 
exists a neighborhood Ug(a) of a in E whose image f(Uz(a)) is contained 


in V (f(a)) 
Thus 


(f : E > R” is continuous at a € E) := 
= (VV (£(a)) Uz (a) (f(Uz(a)) c V(F(a)))) - 


We see that Definition 6 has the same form as Definition 1 for continuity of 
a real-valued function, which we are familiar with from Sect. 4.1. As was the 
case there, we can give the following alternate expression for this definition: 


(f : E > R” is continuous at a € E) := 

= (Ve > 0 46 > 0 Yx € E (d(x,a) < 6 = d( f(x), f(a)) < £)) , 
or, if a is a limit point of E, 

n n 3 3 — ` = 

(f : E > R” is continuous at a € E) := (jim f(x) = f(a). 
As noted in Chapt. 4, the concept of continuity is of interest precisely in 

connection with a point a € E that is a limit point of the set EF on which the 
function f is defined. 


It follows from Definition 6 and relation (7.4) that the mapping f : E > 
R” defined by the relation 


(t,...,0") =arSy=(yl,...,y") = 
= (fis yee een) f 
is continuous at a point if and only if each of the functions y* = f*(z1,...,2™) 


is continuous at that point. 
In particular, we recall that we defined a path in R” to be a mapping f : 
I — R” of an interval I C R defined by continuous functions f1(x),..., f? (x) 


in the form 
æ y= Yen) =F erect" @) . 
Thus we can now say that a path in R” is a continuous mapping of an 
interval J C R of the real line into R”. 
By analogy with the definition of oscillation of a real-valued function at 
a point, we introduce the concept of oscillation at a point for a function with 


values in R”. 
Let E be a subset of R”, a € E, and Bg(a;r) = EN B(a;r). 
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Definition 7. The oscillation of the function f : E — R” at the point a € E 
is the quantity 
w(f;a) = lim w(f;Be(a;r)). 


From Definition 6 of continuity of a function, taking account of the prop- 
erties of a limit and the Cauchy criterion, we obtain a set of frequently used 
local properties of continuous functions. We now list them. 


Local properties of continuous functions 


a) A mapping f : E —> R” of a set E C R™ is continuous at a point 
a € E if and only if w(f;a) = 0. 

b) A mapping f : E > R” that is continuous ata € E is bounded in some 
neighborhood Ug(a) of that point. 


c) If the mapping g : Y — R* of the set Y C R” is continuous at a point 
yo E Y and the mapping f : X — Y of the set X C R™ is continuous at a 
` point xo E X and f(xo0) = yo, then the mapping go f : X — R* is defined, 
and it is continuous at Xp E X. 


Real-valued functions possess, in addition, the following properties. 


d) If the function f : E — R is continuous at the point a E E and 
f(a) > 0 (or f(a) < 0), there exists a neighborhood Ug(a) of a in E such 
that f(x) >0 (resp. f(x) < 0) for all x € Ug(a). 

e) If the functions f : E —> R and g : E —> R are continuous ata € E, 
then any linear combination of them (af+6g): E —> R, where a, 8 € R, their 
product (f - g): E > R, and, if g(x) #0 on E, their quotient (£) :E >R 
are defined on E and continuous at a. 


Let us agree to say that the function f : E —> R” is continuous on the set 
E if it is continuous at each point of the set. 

The set of functions f : E —> R” that are continuous on E will be denoted 
C(E; R”) or simply C(E), if the range of values of the functions is unam- 
biguously determined from the context. As a rule, this abbreviation will be 
used when R” = R. 


Example 6. The functions (z!,...,2) “5 zt (i = 1,...,m), mapping 
R” onto R (projections) are obviously continuous at each point a = 
(a',...,a) € R”, since lim mt (x) = a = r*(a). 

r—a 


Example 7. Any function x +> f(x) defined on R, for example xz +> sin z, can 
also be regarded as a function (x,y) tee f(x) defined, say, on R?. In that case, 


if f was continuous as a function on R, then the new function (z, y) zan f(x) 
will be continuous as a function on R?. This can be verified either directly 
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from the definition of continuity or by remarking that the function F is the 
composition (f o71)(z,y) of continuous functions. 

In particular, it follows from this, when we take account of c) and e), that 
the functions 


f(z,y)=sinz+e, f(x,y) = arctan (In(|z| + |y| + 1)) , 
for example, are continuous on R?. 


We remark that the reasoning just used is essentially local, and the fact 
that the functions f and F studied in Example 7 were defined on the entire 
real line R or the plane R? respectively was purely an accidental circumstance. 


Example 8. The function f(x,y) of Example 2 is continuous at any point 
of the space R? except (0,0). We remark that, despite the discontinuity of 
f(x,y) at this point, the function is continuous in either of its two variables 
for each fixed value of the other variable. 


Example 9. If a function f : E —> R” is continuous on the set E and E is 
a subset of E, then the restriction f! g of f to this subset is continuous on 


E, as follows immediately from the definition of continuity of a function at a 
point. 


We now turn to the global properties of continuous functions. To state 
them for functions f : E — R”, we first give two definitions. 


Definition 8. A mapping f : E > R” of a set E C R™ into R” is uniformly 
continuous on E if for every € > 0 there is a number 6 > O such that 
d( f(x1), f(x2)) < for any points z1, £2 € E such that d(x, 22) < ô. 


As before, the distances d(1, 22) and d(f (x1), f(x2)) are assumed to be 
measured in R™ and R” respectively. 

When m = n = 1, this definition is the definition of uniform continuity 
= of numerical-valued functions that we already know. 


Definition 9. A set E C R™ is pathwise connected if for any pair of its 
points zo, xı, there exists a path I’ : J — E with support in E and endpoints 
at these points. 


In other words, it is possible to go from any point xp € E to any other 
point xı E€ E without leaving E. 

Since we shall not be considering any other concept of connectedness for a 
set except pathwise connectedness for the time being, for the sake of brevity 
we shall temporarily call pathwise connected sets simply connected. 


Definition 10. A domain in R™ is an open connected set. 
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Example 10. An open ball B(a;r), r > 0, in R” is a domain. We already 
know that B(a;r) is open in R”. Let us verify that the ball is connected. Let 
zo = (x$ ..., 27") and zı = (z},..., 27") be two points of the ball. The path 
defined by the functions z*(t) = tri + (1 — t)z}, (i = 1,...,m), defined on 
the closed interval 0 < t < 1, has zg and xı as its endpoints. In addition, 
its support lies in the ball B(a;r), since, by Minkowski’s inequality, for any 
t € [0,1], 


d(z(t),a) = 


N (zi - ai)? <tr+(1-t)r=r. 


i1=1 


Example 11. The circle (one-dimensional sphere) of radius r > 0 is the subset 
of R? given by the equation (x1)? + (x7)? = r?. Setting x! = rcost, x? = 
rsint, we see that any two points of the circle can be joined by a path that 
goes along the circle. Hence a circle is a connected set. However, this set is 
not a domain in R?, since it is not open in R?. 


We now state the basic facts about continuous functions in the large. 
Global properties of continuous functions 


a) If a mapping f : K — R” is continuous on a compact set K C R”, 
then it is uniformly continuous on K. 


b) If a mapping f : K —> R” is continuous on a compact set K C R”, 
then it is bounded on K. 


c) If a function f : K > R is continuous on a compact set K C R”, then 
it assumes its maximal and minimal values at some points of K. 


d) If a function f : E —> R is continuous on a connected set E and 
assumes the values f(a) = A and f(b) = B at points a,b € E, then for any 
C between A and B, there is a point c € E at which f(c) = C. 


Earlier (Sect. 4.2), when we were studying the local and global properties 
of functions of one variable, we gave proofs of these properties that extend to 
the more general case considered here. The only change that must be made 
in the earlier proofs is that expressions of the type |x%1 — z2| or |f (x1) — f(z2)| 
must be replaced by d(x1, £2) and d( flx), f (z2)), where d is the metric in 
the space where the points in question are located. This remark applies fully 
to everything except the last statement d), whose proof we now give. 
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Proof. d) Let I : I — E bea path that is a continuous mapping of an interval 
lœ, 6] = I C R such that ['(a) = a, I'(B) = b. By the connectedness of FE 
there exists such a path. The function fo I: I > R, being the composition 
of continuous functions, is continuous; therefore there is a point y € [a, 8] on 
the closed interval [a, 8] at which f o r(y) = C. Set c= I(y). Then c € E 
and f(c) =C. O 


Example 12. The sphere S(0;r) defined in R™ by the equation 
(x1)? ae 2 (2)? = r2 , 


is a compact set. 
Indeed, it follows from the continuity of the function 


(a),...,0™) > (x)? +- (2) 


that the sphere is closed, and from the fact that |z’| < r (i =1,...,m) on 
the sphere that it is bounded. 
The function 


(ince (GP tee")? a Sie a Gry 


is continuous on all of R™, so that its restriction to the sphere is also con- 
tinuous, and by the global property c) of continuous functions assumes its 
minimal and maximal values on the sphere. At the points (1,0,...,0) and 
(0,...,0,1) this function assumes the values 1 and —1 respectively. By the 
connectedness of the sphere (see Problem 3 at the end of this section), global 
property d) of continuous functions now enables us to assert that there is a 
point on the sphere where this function assumes the value 0. 


Example 18. The open set R” \ S(0;r) for r > 0 is not a domain, since it is 
not connected. 

Indeed, if l: I — R™ is a path one end of which is at the point xp = 
 (0,...,0) and the other at some point zı = (xj,..., 27") such that (x})?+---+ 
(xT)? > r?, then the composition of the continuous functions I : I > R™ 
and f : R™ — R, where 


(cl,...,0) 4s (al)? +--- + (2™)?, 


is a continuous function on J assuming values less than r? at one endpoint 
and greater than r? at the other. Hence there is a point y on J at which 
(f oL)(y) =r?. Then the point x, = I (y) in the support of the path turns 
out to lie on the sphere S(0;7r). We have thus shown that it is impossible to 
get out of the ball B(0;r) c R™ without intersecting its boundary sphere 
S(0;7r). 
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7.2.3 Problems and Exercises 


1. Let f € C(R”; R). Show that 
a) the set Eı = {x € R™| f(x) < c} is open in R”; 
b) the set E2 = {x € R™| f(x) < c} is closed in R”; 
c) the set E3 = {x € R™ | f(x) = c} is closed in R”; 
d) if f(x) + +00 as x — œ, then E2 and E3 are compact in R”; 
e) for any f : R™ — R the set E4 = {x € R” |w(f; x) > e} is closed in R”. 


2. Show that the mapping f : R” — R” is continuous if and only if the preimage 
of every open set in R” is an open set in R”. 


3. Show that 


a) the image f(E) of a connected set E C R™ under a continuous mapping 
f: E — R” is a connected set; 


b) the union of connected sets having a point in common is a connected set; 
c) the hemisphere (x1)? + --- + (2)? = 1, 2” > 0, is a connected set; 
d) the sphere (x')? +---+(2™)? = 1 is a connected set; 


e) if E C R and E is connected, then E is an interval in R (that is, a closed 
interval, a half-open interval, an open interval, or the entire real line).; 


f) if zo is an interior point and zı an exterior point in relation to the set M C 
R”, then the support of any path with endpoints xo, x1 intersects the boundary of 
the set M. 


8 The Differential Calculus of Functions 
of Several Variables 


8.1 The Linear Structure on R™ 


8.1.1 R” as a Vector Space 


The concept of a vector space is already familiar to you from your study of 


algebra. 
If we introduce the operation of addition of elements x; = (zj,..., 2%") 
and rq = (x4,..., 25") in R” by the formula 
ay + z2 = (2t+23,...,07 +2), (8.1) 
and multiplication of an element x = (z',...,2™) by a number AER via 
the relation | 
Av = (Az!,...,A(2™), | (8.2) 


then R” becomes a vector space over the field of real numbers. Its points can 
now be called vectors. 
The vectors 


e; = (0,...,0,1,0,...,0) (i=1,...,m) (8.3) 


(where the 1 stands only in the ith place) form a maximal linearly indepen- 
dent set of vectors in this space, as a result of which it turns out to be an 
m-dimensional vector space. 

Any vector x € R™ can be expanded with respect to the basis (8.3), that 
is, represented in the form 


x = zle +- +H r”em . (8.4) 


When vectors are indexed, we shall write the index as a subscript, while 
denoting its coordinates, as we have been doing, by superscripts. This is 
convenient for many reasons, one of which is that, following Einstein,’ we 


1 A. Einstein (1879-1955) — greatest physicist of the twentieth century. His work in 
quantum theory and especially in the theory of relativity exerted a ASON 
influence on all of modern physics. 
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can make the convention of writing expressions like (8.4) briefly in the 


form | 
t=T eia (8.5) 


taking the simultaneous presence of subscript and superscript with the same 
letter to indicate summation with respect to that letter over its range of 
variation. 


8.1.2 Linear Transformations L : R™ — R?” 


We recall that a mapping L : X — Y from a vector space X into a vector 
space Y is called linear if 


L(Aiz1 + A222) = A, (21) + Ag L (x2) 


for any 21,22 E X, and Aj, Ag € R. We shall be interested in linear mappings 
L:R” > R”. 

If {e1,..., €m} and {€1,...,é€,} are fixed bases of R” and R” respectively, 
then, knowing the expansion 


L(e;) = até, +---+a?é, =alé; (i=1,...,m) (8.6) 


of the images of the basis vectors under the linear mapping L : R” — R”, 
we can use the linearity of L to find the expansion of the image L(h) of any 
vector h = hte, +---+h™em = h'e; in the basis {€),...,é,}. To be specific, 


L(h) = L(h'e;) = h'L(e;) = hi'alé; = al h*é; . (8.7) 
Hence, in coordinate notation: 
Ga ipa hs (8.8) 


For a fixed basis in R” the mapping L : R™ — R” can thus be regarded 


as a set 
Lea oa) (8.9) 


of n (coordinate) mappings LÍ : R™ > R. 

Taking account of (8.8), we easily conclude that a mapping L : R” — R” 
is linear if and only if each mapping L/ in the set (8.9) is linear. 

If we write (8.9) as a column, taking account of relation (8.8), we have 


L! (h) Soma \ h 
L(h) = be as sig gee as TE , (8.10) 
L” (h) at --- ar h™ 


Thus, fixing bases in R” and R” enables us to establish a one-to-one corre- 
spondence between linear transformations L : R” — R” and m x n-matrices 
(a?), (i =1,...,m, j =1,...,n). When this is done, the ith column of the 


(A 
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matrix (a?) corresponding to the transformation L consists of the coordi- 
nates of L(e;), the image of the vector e; € {e1,...,€m}. The coordinates 
of the image of an arbitrary vector h = hte; € R™ can be obtained from 
(8.10) by multiplying the matrix of the linear transformation by the column 
of coordinates of h. 

Since R” has the structure of a vector space, one can speak of linear 
combinations A; fı + A2fo of mappings fı : X > R” and fo: X > R”, 
setting 


Arfi + A2f2)(2) = Ar fi (£) + à2f2(2) . (8.11) 


In particular, a linear combination of linear transformations Lı : R” —> 
R” and Lo : R™ — R” is, according to the definition (8.11), a mapping 


hwy AiL (h) + A2L2(h) = L(h) : 


which is obviously linear. The matrix of this transformation is the corre- 
sponding linear combination of the matrices of the transformations Lı and 
Ly. 

The composition C = B o A of linear transformations A : R™ — R” and 
B : R” > RF is obviously also a linear transformation, whose matrix, as 
follows from (8.10), is the product of the matrix of A and the matrix of B 
(which is multiplied on the left). Actually, the law of multiplication for matri- 
ces was defined in the way you are familiar with precisely so that the product 
of matrices would correspond to the composition of the transformations. 


8.1.3 The Norm in R™ 


The quantity 
|r| = y (21)? +--+ + (tm)? (8.12) 


is called the norm of the vector x = (z!,...,2) € R”. 
It follows from this definition, taking account of Minkowski’s inequality, 
that 


1° ||z|| > 0, 

2° (||z|] = 0) + (£ = 0), 

3° ||Az|| = JA] - ||z||, where A € R, 
4° |jæ1 + all < llel] + llz2ll- 

In general, any function || || : X — R on a vector space X satisfying 
conditions 1°-4° is called a norm on the vector space. Sometimes, to be 
precise as to which norm is being discussed, the norm sign has a symbol 
attached to it to denote the space in which it is being considered. For example, 
we may write ||z||Rm or |/y||Rx. As a rule, however, we shall not do that, since 


it will always be clear from the context which space and which norm are 
meant. 
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We remark that by (8.12) 
z2 — xı|| = d(x, £2) , (8.13) 


where d(x1, £2) is the distance in R” between the vectors xı and z2, regarded 
as points of R”. 
It is clear from (8.13) that the following conditions are equivalent: 


T — T0, d(xz,zo) > 0, |e — zoll > 0. 
In view of (8.13), we have, in particular, 
\|z|| = d(0, £) . 


Property 4° of a norm is called the triangle inequality, and it is now clear 
why. 

The triangle inequality extends by induction to the sum of any finite 
number of terms. To be specific, the following inequality holds: 


lær +--+ + ell < [laa] +--+ [eel - 


The presence of the norm of a vector enables us to compare the size of 
values of functions f : X — R” and g: X > R”. 

Let us agree to write f(x) = o(g(x)) or f = o(g) over a base B in X if 
f(z) |lam = o(||g(x)||R~) over the base B. 

If f(x) = (f*(a),..., f"(x)) is the coordinate representation of the map- 
ping f : X — R”, then in view of the inequalities 


If @I S MEE S IFO! (8.14) 


i=1 
one can make the following observation, which will be useful below: 


(f = o(g) over the base B) = (f° = 0(g) over the base B; i=1,...,m). 
| (8.15) 
We also make the convention that the statement f = O(g) over the base 
B in X will mean that || f(z)|lz~ = O (||g(x£)||r») over the base B. 
We then obtain from (8.14) 


(f = O(g) over the base B) & (f° = O(g) over the base B; i=1,... ,m) ; 
(8.16) 

Example. Consider a linear transformation L : R™ — R”. Let h = hte, + 

--.+h™e, be an arbitrary vector in R™. Let us estimate ||L(h)||pn: 


ILI = ||) Lle) 


< JD Llen a'l < (> ircen) all. (8.17) 
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Thus one can assert that 
[(h) = O(h) aah 0. (8.18) 


In particular, it follows from this that L(x — zo) = L(x) — L(zo) > 0 
as x — Zo, that is, a linear transformation L : R™ — R?” is continuous at 
every point 2) € R”. From estimate (8.17) it is even clear that a linear 
transformation is uniformly continuous. 


8.1.4 The Euclidean Structure on R™ 


The concept of the inner product in a real vector space is known from algebra 
as a numerical function (x,y) defined on pairs of vectors of the space and 
possessing the properties 


(z,xz) >0, 
(2,2) =022=0;, 
(£1, T2) = (T2, T1) , 
(At1, 22) = A(z1, £2), whereàER, 


(£1 + 22,23) = (£1, £3) + (£2, £3) . 


It follows in particular from these properties that if a basis {e1,..., €m} 
is fixed in the space, then the inner product (x,y) of two vectors x and y can 
be expressed in terms of their coordinates (z1,...,2) and (y',...,y’") as 


the bilinear form E 
(2, Y) = ijt Y (8.19) 
(where summation over 7 and j is understood), in which g;; = (ei, e;). 
Vectors are said to be orthogonal if their inner product equals 0. 


A basis {e1,...,@m} is orthonormal if gi; = 6;;, where 
0,ift #7, 
Oi; = 
1,ifi=j. 


In an orthonormal basis the inner product (8.19) has the very simple form 
(2) y) as ity ’ 
or 
(zy =a ey tee tamey™, (8.20) 


Coordinates in which the inner product has this form are called Cartesian 
coordinates. | 

We recall that the space R” with an inner product defined in it is called 
Euclidean space. 
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Between the inner product (8.20) and the norm of a vector (8.12) there 


is an obvious connection 
maose 


The following inequality is known from algebra: 


(x,y)? < (x, x) (y, y) . 


It shows in particular that for any pair of vectors there is an angle ọ € [0,7] 
such that 
(x,y) = ||z|| llyll cos¢ . 


This angle is called the angle between the vectors x and y. That is the 
reason we regard vectors whose inner product is zero as orthogonal. 

We shall also find useful the following simple, but very important fact, 
known from algebra: 


any linear function L : R™ — R in Euclidean space has the form 
L(x) = (€, 2) , 


where € € R” is a fixed vector determined uniquely by the function L. 


8.2 The Differential of a Function of Several Variables 


8.2.1 Differentiability and the Differential of a Function at a Point 


Definition 1. A function f : E — R” defined on a set E C R”™ is differen- 
tiable at the point x € E, which is a limit point of E, if 


f(x +h) -— f(z) = L(x)h + a(a;h), (8.21) 


where L(x) : R” — R” is a function? that is linear in h and a(z; h) = o(h) 
ash>0,xr+hekE. 


The vectors 


Az(h) := (c«+h)—-az=h, 
Af(z;h) := f(x +h) — f(z) 


are called respectively the increment of the argument and the increment of 
the function (corresponding to this increment of the argument). These vectors 
are traditionally denoted by the symbols of the functions of h themselves Az 


* By analogy with the one-dimensional case, we allow ourselves to write L(x)h 
instead of L(x)(h). We note also that in the definition we are assuming that R” 
and R” are endowed with the norm of Sect. 8.1. 
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and Af(x). The linear function L(x) : R™ — R” in (8.21) is called the 
differential, tangent mapping, or derivative mapping of the function f : E > 
R?” at the point x € E. 

The differential of the function f : E — R” at a point x € E is denoted 
by the symbols df (x), Df(x), or f'(x). 

In accordance with the notation just introduced, we can rewrite relation 
(8.21) as 

f(£ +h) — f(z) = f'(z)h + a(x; h) 


or 


Af(z;h) = df(x)h+a(a;h) . 


We remark that the differential is defined on the displacements h from 
the point x € R”. 

To emphasize this, we attach a copy of the vector space R™ to the point 
x € R” and denote it T,R™”, TR™ (x), or TR?. The space TR? can be 
interpreted as a set of vectors attached at the point x € R™. The vector 
space TR?” is called the tangent space to R™ at x € R™. The origin of this 
terminology will be explained below. 

The value of the differential on a vector h € TR? is the vector f'(x)h € 
TR*(,) attached to the point f (x) and approximating the increment f(x + 
h) — f(x) of the function caused by the increment h of the argument z. 

Thus df (x) or f'(x) is a linear transformation f'(x) : TR? > TR¥,,). 

We see that, in complete agreement with the one-dimensional case that 
we studied, a vector-valued function of several variables is differentiable at a 
point if its increment Af(z;h) at that point is linear as a function of h up 
to the correction term a(z;h), which is infinitesimal as h —> 0 compared to 
the increment of the argument. 


8.2.2 The Differential and Partial Derivatives 
of a Real-valued Function 


If the vectors f(x+h), f(x), L(x)h, a(x; h) in R” are written in coordinates, 
Eq. (8.21) becomes equivalent to the n equalities 


fil +h) -— f(x) = L(x)h +a (x;h) (i=1,...,n) (8.22) 


between real-valued functions, in which, as follows from relations (8.9) and 
(8.15) of Sect. 8.1, Li (x) : R™ > R are linear functions and ot (x;h) = o(h) 
as h— 0, x+h €E, for every i = 1,...,n. 

Thus we have the following proposition. 


Proposition 1. A mapping f : E > R” of a set E C R™ is differentiable at 
a point x € E that is a limit point of E if and only if the functions ft : E> R 
(i = 1,... n) that define the coordinate representation of the mappping are 
differentiable at that point. 
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Since relations (8.21) and (8.22) are equivalent, to find the differential 
L(x) of a mapping f : E — R” it suffices to learn how to find the differentials 
L*(x) of its coordinate functions ft : E > R. 

Thus, let us consider a real-valued function f : E — R, defined on a 
set & C R” and differentiable at an interior point x € E of that set. We 
remark that in the future we shall mostly be dealing with the case when E is 
a domain in R”. If x is an interior point of E, then for any sufficiently small 
displacement h from x the point x+h will also belong to E, and consequently 
will also be in the domain of definition of the function f : E > R. 

If we pass to the coordinate notation for the point x = (z!,...,2™), the 
vector h = (h',...,h™), and the linear function L(x)h = ai(x)h* + --- + 
Am(x)h™, then the condition 


f(a+h) — f(z) = L(z)h+ o(h) ash > 0 (8.23) 
can be rewritten as 
fle +h, £ +h") = f(t, r) 
= a;(r)h' +--+ +am(z)h™ +o(h) as h => 0 , (8.24) 


where a1(x),...,@m(z) are real numbers connected with the point zx. 
We wish to find these numbers. To do this, instead of an arbitrary dis- 
placement h we consider the special displacement 


hi = h'e; =0 -e1 +: +0- ei-1 + hie; +0- ei + +0: Eem 


by a vector h; collinear with the vector e; of the basis {e1,..., €m} in. R”. 
When h = hj, it is obvious that ||h|| = |h*|, and so by (8.24), for h = h; 
we obtain 


Fg ER KI ince kee: PS 
= a;(x)h’ + 0(h’) as kê > 0. (8.25) 
This means that if we fix all the variables in the function f(z',...,2™) 


except the ith one, the resulting function of the ith variable alone is differ- 
entiable at the point 2’. 
In that way, from (8.25) we find that 


a(x) = (8.26) 
1 i-1 mi i itl m\ _ 1 i m 
aia Ht eena a ERa se VSG E te inert Je 
hi—0 h’? 
Definition 2. The limit (8.26) is called the partial derivative of the function 


f(x) at the point x = (xt,...,x™) with respect to the variable x’. We denote 
it by one of the following symbols: 


sat). Oif(x), Dif(z), f(a). 
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Example 1. If f(u,v) = u3 + v? sinu, then 
Of 


Of (u,v) = By (ur) = 3u? +v? cosu, 


of 


02 f(u, v) — Ov 


(u,v) = 2vsinu. 


Example 2. If f(x,y,z) = arctan(xy) + e”, then 


2 


ð 

O1f (x, y, z) = ÎI (ey, 2) = = ’ 
ð 22 

Oo f (x, y, 2) = S (as vs2) = ET ’ 


Bafle z) = ZL (a,y,2) = 0. 


Thus we have proved the following result. 


Proposition 2. Jf a function f : E —> R” defined on a set E C R™ is differ- 
entiable at an interior point x € E of that set, then the function has a partial 
derivative at that point with respect to each variable, and the differential of 
the function is uniquely determined by these partial derivatives in the form 


df(x)h = oF (a)r ++ ŽE aar i (8.27) 


Using the convention of summation on an index that appears as both a 
subscript and a superscript, we can write formula (8.27) succinctly: 


df(x)h = 0;f(x)h' . (8.28) 


Example 3. If we had known (as we soon will know) that the function 
f(x,y,z) considered in Example 2 is differentiable at the point (0,1,0), we 
could have written immediately 


df(0,1,0)h=1-h'+0-h?4+1-h? =h! +h’ 
and accordingly 


or 
arctan (h? (1 + h?)*) + eM = 14h! +h? + o(h)ash—>0. 
Example 4. For the function x = (z1,...,2™) nae x’, which assigns to the 


point x € R” its ¿th coordinate, we have 


Ar (z; h) = ($ +h’) -t =h', 
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that is, the increment of this function is itself a linear function in h: h mal hi. 
Thus, Art (z; h) = dr? (x)h, and the mapping dz*(x) = drê turns out to be 
actually independent of x € R” in the sense that drt (x)h = hê at every point 
x € R”. If we write zt (x) instead of 1*(x), we find that dzi (x)h = dz*h = ht. 


Taking this fact and formula (8.28) into account, we can now represent 
the differential of any function as a linear combination of the differentials of 
the coordinates of its argument x € R™. To be specific: 

pa Opa Of 
df(x) = O:f(2) da" = yi te ++ ym de” i (8.29) 
since for any vector h € TR? we have 


df(x)h = ðf (z)h? = ðf (xL)dzİh . 


8.2.3 Coordinate Representation of the Differential 
of a Mapping. The Jacobi Matrix 


Thus we have found formula (8.27) for the differential of a real-valued function 
f : E —> R. But then, by the equivalence of relations (8.21) and (8.22), for 
any mapping f : E — R” of a set E C R” that is differentiable at an interior 
point x € E, we can write the coordinate representation of the differential 
df (x) as 


df! (z)h ðf! (z)h? BF (2) ++ of (x) h} 
DEES) Sedat —5 eee =| gealtereeedeee nines 
df"(x)h Oif” (x)h? Of (a) --- (a) he 
(8.30) 


Definition 3. The matrix (0;f?(z)) (i = 1,...,m, j = 1,...,n) of partial 
derivatives of the coordinate functions of a given mapping at the point x € E 
is called the Jacobi matrix? or the Jacobian* of the mapping at the point. 


In the case when n = 1, we are simply brought back to formula (8.27), 
and when n = 1 and m = 1, we arrive at the differential of a real-valued 
function of one real variable. 

The equivalence of relations (8.21) and (8.22) and the uniqueness of the 
differential (8.27) of a real-valued function implies the following result. 


Proposition 3. If a mapping f : E > R” of a set E C R™ is differentiable 
at an interior point x € E, then it has a unique differential df (x) at that 
point, and the coordinate representation of the mapping df (x) : TR? —> 
TRF (x) is given by relation (8.30). 


3 C.G. J. Jacobi (1804-1851) — well-known German mathematician. 
“ The term Jacobian is more often applied to the determinant of this matrix (when 
it is square). 
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8.2.4 Continuity, Partial Derivatives, 
and Differentiability of a Function at a Point 


We complete our discussion of the concept of differentiability of a function at 
a point by pointing out some connections among the continuity of a function 
at a point, the existence of partial derivatives of the function at that point, 
and differentiability at the point. 

In Sect. 8.1 (relations (8.17) and (8.18)) we established that if L : R” > 
IR” is a linear transformation, then Lh — 0 as h > 0. Therefore, one can 
conclude from relation (8.21) that a function that is differentiable at a pom 
is continuous at that point, since 


f(x+h)— f(x) =L(x)h+o(h) asho0,r+he EL. 


The converse, of course, is not true because, as we know, it fails even in 
the one-dimensional case. 

Thus the relation between continuity and differentiability of a function at 
a point in the multidimensional case is the same as in the one-dimensional 
case. 

The situation is completely different in regard to the relations between 
partial derivatives and the differential. In the one-dimensional case, that is, 
in the case of a real-valued function of one real variable, the existence of 
the differential and the existence of the derivative for a function at a point 
are equivalent conditions. For functions of several variables, we have shown 
(Proposition 2) that differentiability of a function at an interior point of its 
domain of definition guarantees the existence of a partial derivative with 
respect to each variable at that point. However, the converse is not true. 


Example 5. The function 
0, ifz'z?=0, 
f(x’ ,«*)= 
1, if ziz? #0, 


equals 0 on the coordinate axes and therefore has both partial derivatives at 
the point (0,0): 


5 f(h?,0) - f(0,0) _ 0-0 _ 

A = en, a= im, =O 
2 = 

BSO, = tm LOAD _ py 9-8 9 


At the same time, this function is not differentiable at (0,0), since it is 
obviously discontinuous at that point. 


_ The function given in Example 5 fails to have one of its partial derivatives 
at points of the coordinate axes different from (0,0). However, the function 


440 8 Differential Calculus in Several Variables 


pie, ife? +y 40, 
f(z,y) = 
0, ifz?+y?=0 


(which we encountered in Example 2 of Sect. 7.2) has partial derivatives at 
all points of the plane, but it also is discontinuous at the origin and hence 
not differentiable there. 

Thus the possibility of writing the right-hand side of (8.27) and (8.28) 
still does not guarantee that this expression will represent the differential of 
the function we are considering, since the function may be nondifferentiable. 

This circumstance might have been a serious hindrance to the entire differ- 
ential calculus of functions of several variables, if it had not been determined 
(as will be proved below) that continuity of the partial derivatives at a point 
is a sufficient condition for differentiability of the function at that point. 


8.3 The Basic Laws of Differentiation 


8.3.1 Linearity of the Operation of Differentiation 


Theorem 1. If the mappings fı : E —> R” and fo: E —> R”, defined on a 
set E C R”, are differentiable at a point x € E, then a linear combination 
of them (Ai fi + A2fe) : E —> R” is also differentiable at that point, and the 
following equality holds: 


(Ai fi + Aa fe)’ (£) = (Afi + à2f2)(7) - (8.31) 


Equality (8.31) shows that the operation of differentiation, that is, forming 
the differential of a mapping at a point, is a linear transformation on the 
vector space of mappings f : E — R” that are differentiable at a given point 
of the set E. The left-hand side of (8.31) contains by definition the linear 
transformation (A; fı +A2 f2) (x), while the right-hand side contains the linear 
combination (Ai fi + A2f5)(x) of linear transformations f{(x) : R” — R”, 
and f5(z) : R”™ — R”, which, as we know from Sect. 8.1, is also a linear 
transformation. Theorem 1 asserts that these mappings are the same. 


Proof. 


(Ai fia + Aofe)(a +h) — (Ai f2 + Aofe)(x) = 
= (Ai fi(z + h) + Ag f(x + h)) — Or fi (£) + Ao fe(x)) = 
= Ai (fi(a +h) — fi(z)) + à2(f2(£ +h) — fo(z)) = 
=i (fi (£)h + o(h)) + A2 (f2(£)h + o(h)) = 
= (Ai fi(x) + à2f2(x))h + o(h) . O 
If the functions in question are real-valued, the operations of multiplica- 


tion and division (when the denominator is not zero) can also be performed. 
We have then the following theorem. 
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Theorem 2. If the functions f : E —> R and g : E > R, defined on a set 
E C R”, are differentiable at the point x € E, then 
a) their product is differentiable at x and 


(F - g) (2) = g(x) f'(x) + f(z)g' (2) ; (8.32) 


b) their quotient is differentiable at x if g(x) #0, and 


f f = 1 / | / 
(5) © = aq Oe) - e)a) (8.33) 


The proof of this theorem is the same as the proof of the corresponding 
parts of Theorem 1 in Sect. 5.2, so that we shall omit the details. 

Relations (8.31), (8.32), and (8.33) can be rewritten in the other notations 
for the differential. To be specific: 


d(A1 f(x) + Aofe)(z) = (Ardfi + A2d fa) (2) , 
d(f - g)(x) = g(x)df(z) + f(x)dg(z) , 


a(Z)(2) = S75 (ole)as(e) - Fle da(2)). 


Let us see what these equalities mean in the coordinate representation of 
the mappings. We know that if a mapping y : E — R” that is differentiable 
at an interior point x of the set Æ C R”™ is written in the coordinate form 


ola) = (Fee ZR, ) © = (Bee!) 


will correspond to its differential dy(xz) : R” — R” at this point. 

For fixed bases in R™ and R” the correspondence between linear trans- 
formations L : R” — R” and m x n matrices is one-to-one, and hence the 
linear transformation L can be identified with the matrix that defines it. 

Even so, we shall as a rule use the symbol f'(x) rather than df(x) to 
denote the Jacobi matrix, since it corresponds better to the traditional dis- 
tinction between the concepts of derivative and differential that holds in the 
one-dimensional case. 

Thus, by the uniqueness of the differential, at an interior point x of E 
we obtain the following coordinate notation for (8.31), (8.32), and (8.33), 
denoting the equality of the corresponding Jacobi matrices: 
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(O:(Ar fi + A2fZ)) (x) = Ord: ff + A28: f3) (2) 
Galati benna G3) 


CAUR A = g(x)ði f(x) + f(x)dig(z) (i=1,...,m), (8.32) 
(a(2))@) = ie 5 (9263 f(x) — f(x)d.g(2)) (6 =1,...,m).  (8.33") 


It follows from the elementwise equality of these matrices, for example, 
that the partial derivative with respect to the variable x’ of the product 


of real-valued functions f(z',...,2™) and g(z!,...,x2™) should be taken as 
follows: 
Of OF 9) (gt 1 <<) Z 
ðr’ 
ð „m 9 m 
gla, amS E (a, aa) + Fel, aE (a, 0"). 


We note that both this equality and the matrix equalities (8.31’), (8.32), 
and (8.33’) are obvious consequences of the definition of a partial derivative 
and the usual rules for differentiating real-valued functions of one real vari- 
able. However, we know that the existence of partial derivatives may still turn 
out to be insufficient for a function of several variables to be differentiable. 
For that reason, along with the important and completely obvious equalities 
(8.31’), (8.32), and (8.33’), the assertions about the existence of a differen- 
tial for the corresponding mapping in Theorems 1 and 2 acquire a particular 
importance. 

We remark finally that by induction using (8.32) one can obtain the re- 
lation 


d(fi,---,fk)(&) = (f2 fe)(a)d fila) + +--+ (fi: fe-1)d fe (2) 


for the differential of a product (fı --- fk) of differentiable real-valued func- 
tions. 


8.3.2 Differentiation of a Composition of Mappings (Chain Rule) 


a. The Main Theorem 


Theorem 3. If the mapping f : X > Y of a set X C R” into a set Y C 
R” is differentiable at a point x € X, and the mapping f : Y > R* is 
differentiable at the pointy = f(x) € Y, then their composition gof : X — R* 
is differentiable at x and the differential d(g o f) : TR? > TR f(a) of the 
composition equals the composition dg(y) odf(zx) of the differentials 


df(x): TRY > TR a)2y, doly) : TRY > TR) - 
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The proof of this theorem repeats almost completely the proof of Theo- 
rem 2 of Sect. 5.2. In order to call attention to a new detail that arises in 
this case, we shall nevertheless carry out the proof again, without going into 
technical details that have already been discussed, however. 


Proof. Using the differentiability of the mappings f and g at the points x 
and y = f(x), and also the linearity of the differential g'(x), we can write 


(go f)(£ +h) — (go f)(z) =9(f(a+h)) — 9(f(a)) = 
= g'(f(x)) (f(z +h) — f(x)) +o(f(x + h) — f(z)) = 
= g'(y)(f'(a)h + o(h)) + o( f(a +h) — f(z)) = 
= g'(y)(f'(x)h) + 9'(y)(o(h)) + o( f(a + h) — f(x)) = 
= (g'(y) o f'(x))h + a(z;h) , 


where g'(y) o f'(x) is a linear mapping (being a composition of linear map- 
pings), and 
a(z; h) = g'(y)(o(h)) + o( f(x + h) — f(z)) . 
But, as relations (8.17) and (8.18) of Sect. 8.1 show, 


g'(y)(o(h)) = o(h) ash0, 
f(a +h) — f(z) = f'(z)h+ o(h) = O(h) + o(h) = O(h) ash 0, - 


and 
o( f(x +h) — f(x)) = 0(O(h)) = o(h) ash 0. 


Consequently, 
a(xz;h) = o(h) + o(h) = o(h) aah 0, 
and the theorem is proved. O 


When rewritten in coordinate form, Theorem 3 means that if x is an 
interior point of the set X and 


Of? (x) +++ Omf*(x) 


of” (x) --- mf” (x) 
and y = f(x) is an interior point of the set Y and 


O19" (y) ++ 3ng (y) 


ôg" (y) «++ Ong*(y) 
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then 
O1(9° © f)(z) +++ Om(g* © f)(x) 
Cronos ee = CAC o f)) (£) = 
O1(g* o f)(z) +++ Om(g* 0 f)(z) 
31g (y) --- Ong'(y)\ AF (a) +++ 3mf (x) 
a EE E AT MOE. EEE EEE TTE = (0;9'(y)-0;f7(2)) 
Ag*(y) «+: Ong*(y)] \AaF(2) ++ Om f(z) 


In the equality 
(a(g © f)) (a) = (A;9'(F(a)) - O:f7 (2) (8.34) 


summation is understood on the right-hand side with respect to the index j 
over its interval of variation, that is, from 1 to n. 
In contrast to Eqs. (8.31’), (8.32’), and (8.33’), relation (8.34) is nontrivial 
even in the sense of elementwise equality of the matrices occurring in it. 
Let us now consider some important cases of the theorem just proved. 


b. The Differential and Partial Derivatives of a Composite Real- 


valued Function Let z = g(y’,...,y”) be a real-valued function of the real 
variables y’,...,y”, each of which in turn is a function yf = f7(z!,...,2™) 
(j =1,...,n) of the variables z',...,2™. Assuming that the functions g and 


fÍ are differentiable (j = 1,...,n), let us find the partial derivative AgeP) (i) 
of the composition of the mappings f: X >Y andg: Y >R. 
According to formula (8.34), in which l = 1 under the present conditions, 


we find 
0:(g o f)(x) = Oj9(f(x)) - Af? (z) , (8.35) 


or, in notation that shows more detail, 


dz 1) _ 9° f) ôg dy! ôg By" 


pp) = On Gac E oe Oy” Oat 
Og(f(x)) -Bf (x) +--+ + Ong(f(z)) - Of" (2). 


c. The Derivative with Respect to a Vector and the Gradient of a 
Function at a Point Consider the stationary flow of a liquid or gas in some 
domain G of R3. The term “stationary” means that the velocity of the flow 
at each point of G does not vary with time, although of course it may vary 
from one point of G to another. Suppose, for example, f(x) = f(x}, x”, x3) is 
the pressure in the flow at the point x = (z!, x7, x3) € G. If we move about 
in the flow according to the law x = x(t), where t is time, we shall record 
a pressure (f o x)(t) = f(x(t)) at time t. The rate of variation of pressure 


over time along our trajectory is obviously the derivative see) (4) of the 
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function (f ox)(t) with respect to time. Let us find this derivative, assuming 
that f(x’, x*, x°) is a differentiable function in the domain G. By the rule for 
differentiating composite functions, we find 


d(f oz) of i Of „2 Of 
Hey = F aaa a EO + EEO, 636) 
where z*(t) = dz" (t) (i= l, 2,3). 


Since the Me (x1, t? Te = v(t) is the velocity of our displacement at 
time t and (ô; f, 2f, 33 f)(x) is the coordinate notation for the differential 
df(x) of the function f at the point x, Eq. (8.36) can also be rewritten as 

d L Ox 

l D= = df (x(t))v(t) , (8.37) 

that is, the required quantity is the value of the differential df(z(t)) of the 
function f(x) at the point x(t) evaluated at the velocity vector v(t) of the 


motion. 
In particular, if we were at the point zo = x(0) at time t = 0, then 


d(f o x) 
dt 


(0) =df(zo)v, (8.38) 


where v = v(0) is the velocity vector at time t = 0. 

The right-hand side of (8.38) depends only on the point £o € G and the 
velocity vector v that we have at that point; it is independent of the specific 
form of the trajectory x = x(t), provided the condition (0) = v holds. That 
means that the value of the left-hand side of Eq. (8.38) is the same on any 
trajectory of the form 

x(t) = zro + vt + a(t) , (8.39) 


where a(t) = o(t) as t — 0, since this value is completely determined by 
giving the point Zp and the vector v € TR3, attached at that point. In 
particular, if we wished to compute the value of the left-hand side of Eq. 
(8.38) directly (and hence also the right-hand side), we could choose the law 
of motion to be 

x(t) = to + vt, (8.40) 


corresponding to a uniform motion at velocity v under which we are at the 
point x(0) = xo at time t = 0. 
We now give the following 


Definition 1. If the function f(x) is defined in a neighborhood of the point 
zo E€ R” and the vector v € TRZ, is attached at the point xo, then the 
quantity 

f (zo + vt) — f (Zo) 


- (8.41) 


D, f (zo) = lim 


(if the indicated limit exists) is called the derivative of f at the point zo with 
respect to the vector v or the derivative along the vector v at the point xo. 
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It follows from these considerations that if the function f is differentiable 
at the point xo, then the following equality holds for any function x(t) of the 
form (8.39), and in particular, for any function of the form (8.40): 


d(f oz) 
dt 


Dy f (to) = (0) =df(xo)v . (8.42) 


In coordinate notation, this equality says 
of 1 Of m 
Dy f (Zo) = Fai (To) aie Bam (Lo) = (8.43) 


In particular, for the basis vectors e = (1,0,...,0),..., €m = (0,...,0,1) 
this formula implies 


Def (#0) = 25 (a0) (f= 1,-.-5m). 


By virtue of the linearity of the differential df (xo), we deduce from Eq. 
(8.42) that if f is differentiable at the point zo, then for any vectors v1, v2 € 
TRY, and any 41, A2 E€ R the function has a derivative at the point xo with 


respect to the vector (A1v1 + A2ve) € TRF, and that 


D),v1+d2v2F (Zo) = Ai Dy, f (Zo) + r2 Dv» f (x0) : (8.44) 


If R™ is regarded as a Euclidean space, that is, as a vector space with an 
inner product, then (see Sect. 8.1) it is possible to write any linear functional 
L(v) as the inner product (€,v) of a fixed vector € = (L) and the variable 
vector v. 

In particular, there exists a vector € such that 


df(xo)u = (£, v) . (8.45) 


Definition 2. The vector € € TRY corresponding to the differential df (xo) 
of the function f at the point xp in the sense of Eq. (8.45) is called the 
gradient of the function at that point and is denoted grad f (xo). 


Thus, by definition 


df(zo)v = (grad f (£0), v) . (8.46) 


If a Cartesian coordinate system has been chosen in R™, then, by com- 
paring relations (8.42), (8.43), and (8.46), we conclude that the gradient has 
the following representation in such a coordinate system: 


grad f (ao) = a = RL \ (a0) (8.47) 


We shall now explain the geometric meaning of the vector grad f (xo). 
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Let e € TR be a unit vector. Then by (8.46) 
Def (xo) = |grad f(xo)| cosy , (8.48) 


where y is the angle between the vectors e and grad f (zo). 

Thus if grad f(ro) Æ 0 and e = ||grad f(xo)||~1grad f(xo), the derivative 
D.f (xo) assumes a maximum value. That is, the rate of increase of the func- 
tion f (expressed in the units of f relative to a unit length in R™) is maximal 
and equal to ||grad f(xo)|| for motion from the point xp precisely when the 
displacement is in the direction of the vector grad f(xo). The value of the 
function decreases most sharply under displacement in the opposite direction, 
and the rate of variation of the function is zero in a direction perpendicular 
to the vector grad f(zxo). 

The derivative with respect to a unit vector in a given direction is usually 
called the directional derivative in that direction. 

Since a unit vector in Euclidean space is determined by its direction 
cosines 

e = (cOSQj1,...,COSQm), 


where a; is the angle between the vector e and the basis vector e; in a 
Cartesian coordinate system, it follows that 


D.f (xo) = (grad f(xo),e) = oF (zo) COSQ, + +--+ ea on ee Am - 

£ or” 

The vector grad f(xo) is encountered very frequently and has numerous 
applications. For example the so-called gradient methods for finding extrema 
of functions of several variables numerically (using a computer) are based on 
the geometric property of the gradient just noted. (In this connection, see 
Problem 2 at the end of this section.) 

Many important vector fields, such as, for example, a Newtonian gravi- 
tational field or the electric field due to charge, are the gradients of certain 
scalar-valued functions, known as the potentials of the fields (see Problem 3). 

Many physical laws use the vector grad f in their very statement. For 
example, in the mechanics of continuous media the equivalent of Newton’s 
basic law of dynamics ma = F is the relation 


pa = —grad p, 


which connects the accleration a = a(z,t) in the flow of an ideal liquid or 
gas free of external forces at the point x and time t with the density of the 
medium p = p(z,t) and the gradient of the pressure p = p(z,t) at the same 
point and time (see Problem 4). 

We shall discuss the vector grad f again later, when we study vector 
analysis and the elements of field theory. 
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8.3.3 Differentiation of an Inverse Mapping 


Theorem 4. Let f : U(x) —> V(y) be a mapping of a neighborhood U(x) C 
R” of the point x onto a neighborhood V (y) C R™ of the point y = f(x). 
Assume that f is continuous at the point x and has an inverse mapping 
f-1:V(y) - U(x) that is continuous at the point y. 

Given these assumptions, if the mapping f is differentiable at x and the 
tangent mapping f'(x) : TR? > TR, to f at the point x has an inverse 


[ id (£) : TR? > TRY, then the mapping f -1 ;: V(y) — U(x) is differen- 
tiable at the point y = f(x), and the following equality holds: 
= —1 
a = e. 


Thus, mutually inverse differentiable mappings have mutually inverse tan- 
gent mappings at corresponding points. 


Proof. We use the following notation: 


f(z)=y, fleth)=ytt, t= f(z+h)- f(z), 
so that 


flu) =x, flytt) =xr+h, h= f(y +t)- f(y). 


We shall assume that h is so small that x + h € U(x), and hence y +t € 


V(y). 
It follows from the continuity of f at x and fT} at y that 


t= f(x+h)— f(z)—>0 as h—>0 (8.49) 


and 
h= f(y +t)—- f (y)—>0 as t0. (8.50) 


It follows from the differentiability of f at x that 
t = f'(x)h + o(h) as h> 0, (8.51) 


that is, we can even assert that t = O(h) as h —> 0 (see relations (8.17) and 
(8.18) of Sect. 8.1). 

We shall show that if f'(x) is an invertible linear mapping, then we also 
have h = O(t) as t > 0. 

Indeed, we find successively by (8.51) that 


[fi(a)] t =h + [f'(æ)] tolh) as h0, (8.52) 
[F (£)] t = h + olh) as h—>0, 
[oe] l- lol as kh, 


[roel] ha for a <6, 
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where the number 6 > 0 is chosen so that |lo(h)|| < $||A|| when ||Al| < ô. 
Then, taking account of (8.50), that is, the relation h — 0 as t —> 0, we find 


lal < 2||[#’(@)] tl] = olle) as t+ 0, 


which is equivalent to 
h=O(t) ast-0. 


From this it follows in particular that 
o(h) = o(t) ast 0. © 
Taking this relation into account, we find by (8.50) and (8.52) that 
h = [f'(x)] t+o(t) as t +0 


fo(yt+t)—fol(y) = [f'(@)] "t+ olt) as t> 0. o 


It is known from algebra that if the matrix A corresponds to the linear 
transformation L : R™ — R”, then the matrix A`! inverse to A corresponds 
to the linear transformation L~! : R™ — R” inverse to L. The construction 
of the elements of the inverse matrix is also known from algebra. Conse- 
quently, the theorem just proved provides a direct recipe for constructing the 
mapping (f~*) (y). 

We remark that when m = 1, that is, when R™ = R, the Jacobian of 
the mapping f : U(x) —> V(y) at the point x reduces to the single number 
f'(x) — the derivative of the function f at x — and the linear transformation 
f'(x): TR, — TR, reduces to multiplication by that number: h + f’(ax)h. 
This linear transformation is invertible if and only if f'(x) # 0, and the 


matrix of the inverse mapping |f’ (x)]~ : TR, — TR, also consists of a 


single number, equal to f’ (£) 7>, that is, the reciprocal of f'(x). Hence 
Theorem 4 also subsumes the rule for finding the derivative of an inverse 
function proved earlier. 


8.3.4 Problems and Exercises 


1. a) We shall regard two paths t > 21(t) and t +> x2(t) as equivalent at the point 
xo € R” if xı(0) = x2(0) = zo and d(xi(t), x2(t)) = o(t) as t > 0. 

Verify that this relation is an equivalence relation, that is, it is reflexive, sym- 
metric, and transitive. 

b) Verify that there is a one-to-one correspondence between vectors v € TR 
and equivalence classes of smooth paths at the point Zo. 

c) By identifying the tangent space TRZ with the set of equivalence classes 
of smooth paths at the point zo € R”, introduce the operations of addition and 
multiplication by a scalar for equivalence classes of paths. 

d) Determine whether the operations you have introduced depend on the coor- 
dinate system used in R”. 
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2. a) Draw the graph of the function z = x? + 4y”, where (x,y,z) are Cartesian 
coordinates in R3. 


b) Let f : C — R be a numerically valued function defined on a domain G C R”. 
A level set (c-level) of the function is a set E C G on which the function assumes 
only one value (f(E) = c). More precisely, E = f~t (c). Draw the level sets in R? 
for the function given in part a). 


c) Find the gradient of the function f(x,y) = z? + 4y’, and verify that at any 
point (x,y) the vector grad f is orthogonal to the level curve of the function f 
passing through the point. 


d) Using the results of a), b), and c), lay out what appears to be the shortest 
path on the surface z = z? + 4y? descending from the point (2,1,8) to the lowest 
point on the surface (0, 0,0). 


e) What algorithm, suitable for implementation on a computer, would you pro- 
pose for finding the minimum of the function f(x,y) = x? + 4y?? 


3. We say that a vector field is defined in a domain G of R” if a vector v(x) € TR? 
is assigned to each point x € G. A vector field v(x) in G is called a potential field if 
there is a numerical-valued function U : G > R such that v(x) = grad U(x). The 
function U(z) is called the potential of the field v(x). (In physics it is the function 
—U (x) that is usually called the potential, and the function U (x) is called the force 
function when a field of force is being discussed.) 


a) On a plane with Cartesian coordinates (x,y) draw the field grad f(z, y) 
for each of the following functions: f,(z,y) = z? + y?; fo(z,y) = (z? + y”); 
f(z, y) = arctan(z/y) in the domain y > 0; fa(z, y) = zy. 

b) By Newton’s law a particle of mass m at the point 0 € R? attracts a particle 
of mass 1 at the point x € R? (x Æ 0) with force F = —mJr|~°r, where r is the 


— 
vector Ox (we have omitted the dimensional constant Go). Show that the vector 
field F(z) in R? \ 0 is a potential field. 


c) Verify that masses m; (i = 1,...,n) located at the points (£i, m, Ci) (i = 
1,...,) respectively, create a Newtonian force field except at these points and 
that the potential is the function 


Ulz y, z) = 2S a Or 
( y ) >, (x — &)? + (y — m)? + (2 - $i)? 


d) Find the potential of the electrostatic field created by point charges qi 
(i = 1,...,n) located at the points (€i, m, Çi) (i = 1,...,n) respectively. 


4. Consider the motion of an ideal incompressible liquid in a space free of external 
forces (in particular, free of gravitational forces). 

Let v = v(x, y,z,t), a = a(x, y,z,t), p = p(z, y,z,t), and p = p(z, y, z,t) be 
respectively the velocity, acceleration, density, and pressure of the fluid at the point 
(x,y,z) of the medium at time t. 

An ideal liquid is one in which the pressure is the same in all directions at each 
point. 


a) Distinguish a volume of the liquid in the form of a small parallelepiped, one 
of whose edges is parallel to the vector grad p(x, y, z,t) (where grad p is taken with 
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respect to the spatial coordinates). Estimate the force acting on this volume due 
to the pressure drop, and give an approximate formula for the acceleration of the 
volume, assuming the fluid is incompressible. 


b) Determine whether the result you obtained in a) is consistent with Euler’s 
equation 
pa = —grad p . 


c) A curve whose tangent at each point has the direction of the velocity vector 
at that point is called a streamline. The motion is called stationary if the functions 
v, a, p, and p are independent of t. Using b), show that along a streamline in the 
stationary flow of an incompressible liquid the quantity 3||v||? + p/p is constant 
(Bernoulli’s lau”) 


d) How do the formulas in a) and b) change if the motion takes place in the 
gravitational field near the surface of the earth? Show that in this case 


pa = —grad (gz + p). 


so that now the quantity 3||v||? + gz + p/p is constant along each streamline of 
the stationary motion of an incompressible liquid, where g is the gravitational 
acceleration and z is the height of the streamline measured from some zero level. 


e) Explain, on the basis of the preceding results, why a load-bearing wing has 
a characteristic convex-upward profile. 


f) An incompressible ideal liquid of density p was used to fill a cylindrical glass 
with a circular base of radius R to a depth h. The glass was then revolved about 
its axis with angular velocity w. Using the incompressibility of the liquid, find the 
equation z = f(x,y) of its surface in stationary mode (see also Problem 3 of Sect. 
5.1). 

g) From the equation z = f(x,y) found in part f) for the surface, write a 
formula p = p(x, y, z) for the pressure at each point (x,y,z) of the volume filled by 
the rotating liquid. Check to see whether the equation pa = —grad (gz + p) of part 
d) holds for the formula that you found. 


h) Can you now explain why tea leaves sink (although not very rapidly!) and 
= why they accumulate at the center of the bottom of the cup, rather than its side, 
when the tea is stirred? 


5. Estimating the errors in computing the values of a function. 

a) Using the definition of a differentiable function and the approximate equality 
Af(x;h) ~% df(x)h, show that the relative error 6 = 6 ( f(x); h) in the value of the 
product f(x) = x! ---x™ of m nonzero factors due to errors in determining the 


m 

factors themselves can be found in the form 6 ~ $- 6;, where 6; is the relative error 
i=1 

in the determination of the zth factor. 


5 Daniel Bernoulli (1700-1782) — Swiss scholar, one of the outstanding physicists 
and mathematicians of his time. 
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b) Using the equality dln f(z) = Fis) df(x), obtain the result of part a) again 
and show that in general the relative error in a fraction 


fha a Fae . oti) 
gı: 
can be found as the sum of the relative errors of the values of the functions 


fissis fay Gly 2625 Ge: 


6. Homogeneous functions and Euler’s identity. A function f : G — R defined in 
some domain G C R” is called homogeneous (resp. positive-homogeneous) of degree 
n if the equality 


fre) =X" f(z) (resp. FAs) = |AI"f(2)) 


holds for any x € R” and A € R such that x € G and AT EG. 
A function is locally homogeneous of degree n in the domain G if it is a homo- 
geneous function of degree n in some neighborhood of each point of G. 


a) Prove that in a convex domain every locally homogeneous function is homo- 
geneous. 


b) Let G be the plane R? with the ray L = { (2,4) E R? Ç =2Ay>2 o} 
removed. Verify that the function 


y'/z, if r>2^y>0, 
f(z, y) = 
y’, at other points of the domain, 
is locally homogeneous in G, but is not a homogeneous function in that domain. 


c) Determine the degree of homogeneity or positive homogeneity of the following 
functions with their natural domains of definition: 


1 12 , 23 = 
fila ,...,0 ) =a ae +r r teeta” 2"; 
3,4 
1 2 3 4 rtr? + r’r : 
fala’, a", 2", 2°) = ae > 
Haat Hah Pons wal! tea Had h 


fa(a',...,2) = |r! at. 


d) By differentiating the equality f(tz) = t” f(x) with respect to t, show that if 
a differentiable function f : G — R is locally homogeneous of degree n in a domain 
G C R”, it satisfies the following Euler identity for homogeneous functions: 


a of — (z, ae) pean SE (ah... a) = nf(a',...,2”). 


e) Show that if Euler’s identity holds for a differentiable function f : G —> R in 
a domain G, then that function is locally homogeneous of degree n in G. 


Hint: Verify that the function y(t) = t~” f (tx) is defined for every x € G and 
is constant in some neighborhood of 1. 
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7. Homogeneous functions and the dimension method. 

1°. The dimension of a physical quantity and the properties of functional rela- 
tions between physical quantitities. 

Physical laws establish interconnections between physical quantities, so that if 
certain units of measurement are adopted for some of these quantities, then the 
units of measurement of the quantities connected with them can be expressed in a 
certain way in terms of the units of measurement of the fixed quantities. That is 
how the basic and derived units of different systems of measurement arise. 

In the International System, the basic mechanical units of measurement are 
taken to be the unit of length (the meter, denoted m), mass (the kilogram, denoted 
kg), and time (the second, denoted s). 

The expression of a derived unit of measurement in terms of the basic me- 
chanical units is called its dimension. This definition will be made more precise 
below. 

The dimension of any mechanical quantity is written symbolically as a formula 
expressing it in terms of the symbols L, M, and T proposed by Maxwell® as the 
dimensions of the basic units mentioned above. For example, the dimensions of 
velocity, accleration, and force have respectively the forms 


w] =2T-', [a] =LT?, [F)=MLT™~. 


If physical laws are to be independent of the choice of units of measurement, one 
expression of that invariance should be certain properties of the functional relation 


BOSS ick CE eit) (*) 


between the numerical characteristics of the physical quantities. 

Consider, for example, the relation c = f(a,b) = Va? + b? between the lengths 
of the legs and the length of the hypotenuse of a right triangle. Any change of scale 
should affect all the lengths equally, so that for all admissible values of a and 6 the 
relation f(aa, ab) = y(a)f (a,b) should hold, and in the present case y(a) = a. 

A basic (and, at first sight, obvious) presupposition of dimension theory is that 
a relation (*) claiming physical significance must be such that when the scales of 
the basic units of measurement are changed, the numerical values of all terms of 
the same type occurring in the formula must be multiplied by the same factor. 

In particular, if 71,22,2%3 are basic independent physical quantities and the 
relation (%1,22,23) > f(x1,%2,23) expresses the way a fourth physical quantity 
depends on them, then, by the principle just stated, for any admissible values of 
£1, £2, £3 the equality 


f (0121, 02%2, a313) = p(a1, &2, &3) f (£2, £2, £3), (**) 


must hold with some particular function g. 

The function ¢ in (**) characterizes completely the dependence of the numerical 
value of the physical quantity in question on a change in the scale of the basic fixed 
physical quantities. Thus, this function should be regarded as the dimension of that 
physical quantity relative to the fixed basic units of measurement. 


6 J.C. Maxwell (1831-1879) — outstanding British physicist. He created the math- 
ematical theory of the electromagnetic field, and is also famous for his research 
in the kinetic theory of gases, optics, and mechanics. 
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We now make the form of the dimension function more precise. 


a) Let x +» f(x) be a function of one variable satisfying the condition f(axr) = 
y(a) f(x), where f and y are differentiable functions. 

Show that y(a) = a”. 

b) Show that the dimension function y in Eq. (xx) always has the form af! -a9? - 
af, where the exponents dı, d2,d3 are certain real numbers. Thus if, for example, 
the basic units of L, M, and T are fixed, then the set (dı, d2, d3) of exponents 
expressed in the power representation L?! MT? can also be regarded as the 
dimension of the given physical quantity. 


c) In part b) it was found that the dimension function is always a power function, 
that is, it is a homogeneous function of a certain degree with respect to each of the 
basic units of measurement. What does it mean if the degree of homogeneity of the 
dimension function of a certain physical quantity relative to one of the basic units 
of measurement is zero? 


2° The I-theorem and the dimension method. 

Let [x,] = X: (i = 0,1,...,n) be the dimensions of the physical quantities 
occurring in the law (x). | 

Assume that the dimensions of £o, 2%%41,...,2n can be expressed in terms of 
the dimensions of 21,..., £k, that is, 


1 k 
[xo] = Xo = Xo... XRO i 
1 k 
(Peta) = Xr = A aA m= latek] 


d) Show that the following relation must then hold, along with (*): 


Po pk pi pt Pik Pn—k 
Q1 ++ OL To = f | Q121,- ., Okk, Oy ++ OL Leq1,---, Qy e Bn }. 
(* x *) 
e) If £z1,..., £k are independent, we set a] = £] ',..., Qk = a in («**). Verify 
that when this is done, (* * *) yields the equality 
ZO Lk+1 T 
pl pk -s(a gl; p! pe? ’ 1 = ’ 
x70 myo x?! : xy! gor eee 
which is a relation 
Me Sf Oey oC ndak) (4) 
involving the dimensionless quantities I, Ihh,...,Hn-k. 
Thus we obtain the following 
IT-theorem of dimension theory. If the quantities 21,...,xx in relation (*) are 


independent, this relation can be reduced to the function (****) of n— k dimension- 
less parameters. 


f) Verify that if k = n, the function f in relation (*) can be determined up to a 


numerical multiple by using the J7-theorem. Use this method to find the expression 
c(y~o)+/l/g for the period of oscillation of a pendulum (that is, a mass m suspended 
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by a thread of length l and oscillating near the surface of the earth, where yo is the 
initial displacement angle). 

g) Find a formula P = c,/mr/F for the period of revolution of a body of mass 
m held in a circular orbit by a central force of magnitude F. 

h) Use Kepler’s law (P:/P2)? = (ri/re2)*, which establishes for circular orbits a 
connection between the ratio of the periods of revolution of planets (or satellites) 
and the ratio of the radii of their orbits, to find, as Newton did, the exponent a in 
the law of universal gravitation F = G™7?. 


8.4 The Basic Facts of Differential Calculus 
of Real-valued Functions of Several Variables 


8.4.1 The Mean-value Theorem 


Theorem 1. Let f : G — R be a real-valued function defined in a region 
G C R”, and let the closed line segment [x,x + h) with endpoints x and 
x+h be contained in G. If the function f is continuous at the points of the 
closed line segment |x, x +h] and differentiable at points of the open interval 
Jx, x+h], then there exists a point € €|x,x2+h] such that the following equality 


holds: 
f(z +h) -— f(x) = F(E). (8.53) 


Proof. Consider the auxiliary function 
F(t) = f(x+th) - 


defined on the closed interval 0 < t < 1. This function satisfies all the hy- 
potheses of Lagrange’s theorem: it is continuous on [0,1], being the compo- 
sition of continuous mappings, and differentiable on the open interval |0, 1], 
- being the composition of differentiable mappings. Consequently, there exists 
a point 0 €]0, 1[ such that 


F(1) — F(0) = F'(6)-1. 


But F(1) = f(x +h), F(0) = f(x), F'(0) = f'(x + Oh)h, and hence the 
equality just written is the same as the assertion of the theorem. O 


We now give the coordinate form of relation (8.53). 
If x = (xt,...,£™), h =(h!,...,h™), and € = (x! + 0h},...,2 + 0h”), 
Eq. (8.53) means that 
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f(a+h)— f(z) = f(e' +h',...,027° +h™) -f(e’,...,2™ = 


h? 
= POr = (FEO, 2£) o 


= A f (E)h? +- + Omf(E)h™ 


m : 
= 5 f (2t + 0h',...,0 + Oh™)hi . 
i=1 
Using the convention of summation on an index that appears as both 
superscript and subscript, we can finally write 


fla’ Fha HR) — f(e',...,27) = 
= ðf (£! + 0h',...,2%+60h™)h' , (8.54) 


where 0 < 0 < 1 and 0 depends on both x and h. 


Remark. Theorem 1 is called the mean-value theorem because there exists 
a certain “average” point E €]z,x2 + h| at which Eq. (8.53) holds. We have 
already noted in our discussion of Lagrange’s theorem (Subsect. 5.3.1) that 
the mean-value theorem is specific to real-valued functions. A general finite- 
increment theorem for mappings will be proved in Chap. 10 (Part 2). 


The following proposition is a useful corollary of Theorem 1. 


Corollary. If the function f : G — R is differentiable in the domain G C R™ 
and its differential equals zero at every point x € G, then f is constant in the 
domain G. 


Proof. The vanishing of a linear transformation is equivalent to the vanishing 
of all the elements of the matrix corresponding to it. In the present case 


df(x)h = (Oif,.--,Omf)(x)h , 


and therefore 0, f(x) = --- = m f(x) = 0 at every point x € G. 

By definition, a domain is an open connected set. We shall make use of 
this fact. 

We first show that if x € G, then the function f is constant in a ball 
B(a;r) C G. Indeed, if (x + h) € B(a;r), then [z,x +h] C B(a;r) C G. 
Applying relation (8.53) or (8.54), we obtain 


fiz +h) - f(x) = f (Ek =0-h=0, 


that is, f(x +h) = f(x), and the values of f in the ball B(x;r) are all equal 
to the value at the center of the ball. 

Now let 29,21 € G be arbitrary points of the domain G. By the con- 
nectedness of G, there exists a path t > x(t) € G such that z(0) = xp and 
x(1) = xı. We assume that the continuous mapping t +> x(t) is defined on 
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the closed interval 0 < t < 1. Let B(zo;r) be a ball with center at zo con- 
tained in G. Since z(0) = xp and the mapping t +> z(t) is continuous, there 
is a positive number 6 such that x(t) € B(xo;r) C G for 0 < t < 6. Then, by 
what has been proved, (f o x)(t) = f(zo) on the interval [0, ô]. 

Let l = sup 6, where the upper bound is taken over all numbers 6 € [0,1] 
such that (f o x)(t) = f(xo) on the interval [0,6]. By the continuity of the 
function f (x(t)) we have f(x(l)) = f(xo). But then | = 1. Indeed, if that were 
not so, we could take a ball B(zx(l);r) C G, in which f(x) = f(x(1)) = f(zo), 
and then by the continuity of the mapping t +> x(t) find A > 0 such that 
a(t) € B(x(l);r) for L < t < l+ A. But then (f ox)(t) = f(x(l)) = f (xo) for 
0 <t< l+ A, and so l Æ supôð. 

Thus we have shown that (fox)(t) = f(xo) for any t € [0, 1]. In particular 
(fox)(1) = f(x1) = f(xo), and we have verified that the values of the function 
f : G — R are the same at any two points zo, xı E G. O 


8.4.2 A Sufficient Condition for Differentiability 
of a Function of Several Variables 


Theorem 2. Let f : U(x) — R be a function defined in a neighborhood 
U(x) C R” of the point x = (z',...,2™). 

If the function f has all partial derivatives 2h, ver 2i at each point of 
the neighborhood U(x) and they are continuous at x, then f is differentiable 
at x. 


Proof. Without loss of generality we shall assume that U(x) is a 


ball B(z;r). Then, together with the points x = (z',...,2™) and 
x+h = (x! + hl,...,2% + h™), the points (z',27 + h?,...,2” + 
h™),...,(z',27,...,2-!,2™ + h™) and the lines connecting them must 


also belong to the domain U(x). We shall use this fact, applying the La- 
grange theorem for functions of one variable in the following computation: 


—f(a@th)— f(z) = f(a +hl,...,27% +h™) — f(z',...,2™) = 
= f(z +ht,... £” +h™) — flat a? +R," +A + 
+ flr! a? +h?,...,c27% +h™)—f(r' t? +h®,..., Hh) tee + 
+ f(z',2?,...,2°% 1a" +h™) -— f(zt,...,2™) = 
= ð f(x! + 0th! z? +h?,...,2° +h™)hi + 
+ ðf (x!, x? +60°hR?, £? +h?,..., £ +h™)h? ++ 
FOnS T nt Oh” 
So far we have used only the fact that the function f has partial derivatives 
with respect to each of its variables in the domain U (x). 


We now use the fact that these partial derivatives are continuous at zx. 
Continuing the preceding computation, we obtain 
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f(& +h) -— f(x) = Af(a’,...,2)h' + ath? + 
+Oof(a',...,2)h? +a°h? +--+ 
+Omf(z,...,0™7)h™ +a™h™ , 


where the quantities a,,...,Q , tend to zero as h — 0 by virtue of the 
continuity of the partial derivatives at the point x. 
But this means that 


f(x+h)— f(x) = L(xz)h+o(h) ash-0, 
where L(x)h = 0, f(z!,...,0™)hi +---+Omf(z',...,27)h™. O 


It follows from Theorem 2 that if the partial derivatives of a function 
f : G — R are continuous in the domain G C R™, then the function is 
differentiable at that point of the domain. 

Let us agree from now on to use the symbol C) (G; R), or, more simply, 
C)(G) to denote the set of functions having continuous partial derivatives 
in the domain G. 


8.4.3 Higher-order Partial Derivatives 


If a function f : G — R defined in a domain G C R” has a partial derivative 
2E (x) with respect to one of the variables z!,..., 2, this partial derivative 
is a function 0;f : G — R, which in turn may have a partial derivative 
0; (0;f) (x) with respect to a variable 2’. 

The function 0;(0;f) : G — R is called the second partial derivative of f 
with respect to the variables xt and x) and is denoted by one of the following 
symbols: 


0? f 
The order of the indices indicates the order in which the differentiation is 
carried out with respect to the corresponding variables. 


We have now defined partial derivatives of second order. 
If a partial derivative of order k 


OFf 
cee Ərik 


(z) 


has been defined, we define by induction the partial derivative of order k + 1 
by the relation 


Oiri F(Z) = Ori 


Oia a 2) = O; (if) (x) . 


At this point a question arises that is specific for functions of several vari- 
ables: Does the order of differentiation affect the partial derivative computed? 
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Theorem 3. If the function f : G — R has partial derivatives 
0? f 8? f 
Ox Ox) (2) Oxi Ox" (2) 


in a domain G, then at every point x € G at which both partial derivatives 
are continuous, their values are the same. 


Proof. Let x € G be a point at which both functions 0;;f : G — R and 
jif : G — R are continuous. From this point on all of our arguments are 
carred out in the context of a ball B(z;r) C G, r > 0, which is a convex 
neighborhood of the point x. We wish to verify that 
0? 0? 
EAE E T L a), 
Ox*OxI Ox) Ox" 
Since only the variables zê and z’ will be changing in the computations 

to follow, we shall assume for the sake of brevity that f is a function of two 
variables f(x1, x7), and we need to verify that 


OF 
Ox Ox? 


if the two functions are both continuous at the point (x1, x”). 
Consider the auxiliary function 


F(ht, h?) = f(z! +h, £? +h?) — f(a) +h! x?) — f(r’, 2? +R?) + f(x’, 27), 


(x, 2”) = 


where the displacement h = (h!,h*) is assumed to be sufficiently small, 
namely so small that x + h € B(a;r). 
If we regard F'(h', h?) as the difference 


F(h*,h?) = (1) — ¢(0), 


where y(t) = f(x’ + th!,x2? + h?) — f(x! + th', x7), we find by Lagrange’s 
_ theorem that 


F(h',h?) = p' (01) = (3 f(x? + Oh", £? + h?) — f(x! + 01h, z?))h’ . 
Again applying Lagrange’s theorem to this last difference, we find that 
F(h!, h?) = Oo, f(x! + Oh, £? + 02h? hêh! . (8.55) 
If we now represent F(h!, h?) as the difference 
F(h', h?) = (1) — (0) , 
where ((t) = f(a! + ht, x? + th?) — f(x', x? + th”), we find similarly that 
F(h!, h?) = ða f(x! + Ah! £? + Ogh7)hlh? . (8.56) 
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Comparing (8.55) and (8.56), we conclude that 
z f(x! F Aht, x? + Ozh?) = Oro f(x! F 6, h}, x? + 9h?) ; (8.57) 


where 6), 4, 64, 8. €]0,1[. Using the continuity of the partial derivatives at 
the point (z!, x7), as h + 0, we get the equality we need as a consequence of 
(8.57). 

O21 f (x, x?) = O1of (x*, 2”) 5 0 


We remark that without additional assumptions we cannot say in general 
that ð; f(x) = O;:f(x) if both of the partial derivatives are defined at the 
point x (see Problem 2 at the end of this section). 

Let us agree to denote the set of functions f : G — R all of whose partial 
derivatives up to order k inclusive are defined and continuous in the domain 
G C R” by the symbol C) (G; R) or C% (G). 

As a corollary of Theorem 3, we obtain the following. 


Proposition 1. If f € C“)(G;R), the value 0;,...:, f(x) of the partial deriva- 
tive is independent of the order 71,...,ixn of differentiation, that is, remains 
the same for any permutation of the indices 11,..., ix. 


Proof. In the case k = 2 this proposition is contained in Theorem 3. 

Let us assume that the proposition holds up to order n inclusive. We shall 
show that then it also holds for order n + 1. 

But Oizig--ingi f (£) = Oi, (Oig--ing f)(x). By the induction assumption 
the indices 7i2,...,%n41 can be permuted without changing the function 
Oin---ingi f (£), and hence without changing Oj,...;,,, f(x). For that reason it 
suffices to verify that one can also permute, for example, the indices 7; and 
ig without changing the value of the derivative 0j,i9...:,,,f (£). 

Since 

Oneida f(z) = Õiriz ‘Che f) (x) ’ 
the possibility of this permutation follows immediately from Theorem 3. By 
the induction principle Proposition 1 is proved. O 


Example 1. Let f(x) = f(x!, x?) be a function of class C)(G;R). 
Let h = (ht, h?) be such that the closed interval [x, x + h] is contained in 
the domain G. We shall show that the function 


p(t) = f(x + th) , 
which is defined on the closed interval [0,1], belongs to class C) 0, 1] and 
find its derivative of order k with respect to t. 
We have 
p'(t) =O, f(x! + tht, x? + th?)h' + Oof(x' + th’, x? + th?)h? , 
y(t) = f(x + th)hth! + Ooi f(x + th)h?h! T 
+ Oro f (x S th)h* h? T Ooo f (x F th)h?h? = 
= ðf (x£ + th)(h!)? + 2812 f(x + th)h'h? + Ooo f (£ + th)(h7)? . 
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These relations can be written as the action of the operator (h10, + h?ô2): 
p'(t) = (h'0, + h?202) f(x + th) = haf (£+ th) , 
y(t) = (hôi + h?02)? f(x + th) = hhii f(x + th). 
By induction we obtain 
p™ (t) = (h10, + h?02)* f(a + th) = h" ---h™*O,,...4, f(x + th) 


(summation over all sets 71,...,%, of k indices, each assuming the values 1 
and 2, is meant). 


Example 2. If f(x) = f(a1,...,2™) and f € C“(G;R), then, under the 
assumption that [x,2 +h] C G, for the function y(t) = f(x + th) defined on 
the closed interval [0,1] we obtain | 


p™ (t) = h! ..- hd, f(a + th), (8.58) 


where summation over all sets of indices 71,...,7,, each assuming all values 
from 1 to m inclusive, is meant on the right. 
We can also write formula (8.58) as 


ot) = (nO, +--+ h™On)* f(x + th) . (8.59) 


8.4.4 Taylor’s Formula 


Theorem 4. If the function f : U(x) —> R is defined and belongs to class 
C) (U(x); R) in a neighborhood U(x) C R™ of the point x € R™, and the 
closed interval |x,x + h) is completely contained in U(x), then the following 
equality holds: 


f(x'+h,...,2° +h™) — f(z',...,27) = 


n—1 
1 
= $ es +o + ham) f(a) + Poa (ash) , 
k=1 ` 


(8.60) 


where 


rah) = f ca (4 $$ A™On) "f(a + th) dt | (8,61) 
0 


Equality (8.60), together with (8.61), is called Taylor’s formula with in- 
tegral form of the remainder. 


Proof. Taylor’s formula follows immediately from the corresponding Taylor 
formula for a function of one variable. In fact, consider the auxiliary function 


p(t) = f(z + th), 
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which, by the hypotheses of Theorem 4, is defined on the closed interval 
0 <t< 1 and (as we have verified above) belongs to the class C')(0, 1]. 

Then for 7 € [0,1], by Taylor’s formula for functions of one variable, we 
can write that 


lr) = (0) + Or + + eNO + 
+ J Calero dt. 
0 


Setting 7 = 1 here, we obtain 


(1) = (0) + 4 "(0) +--+ + r ie + 
| a ont p™(t)dt. (8.62) 


Substituting the values 
p(0) = (RO +--+ hm)" f(z)  (k=0,...,n—1), 


p™ (t) = (Ald, +--+ +h™On)" f(x + th) , 


into this equality in accordance with formula (8.59), we find what Theorem 
4 asserts. O 


Remark. If we write the remainder term in relation (8.62) in the Lagrange 
form rather than the integral form, then the equality 


1 1 1 
T E eee m-i) Lom 
(1) =O H ge O + oO), 
where 0 < 0 < 1, implies Taylor’s formula (8.60) with remainder term 
1 
Tn-1(z;h) = — (hoy +++ + hôm) f(x + Oh) . (8.63) 


This form of the remainder term, as in the case of functions of one variable, 
is called the Lagrange form of the remainder term in Taylor’s formula. 


Since f € C) (U(x); R), it follows from (8.63) that 
1 
ren) = (hid +--+ + h” Om)” f(x) + 0{||hl|") as ho, 


and so we have the equality 
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f(t +a',...,2°+h™) -—f(e',...,2™ = 


eee i 
= > z(h â +-+-+h™,,)* f(x) + o(||A||”) as h — 0 , (8.64) 
k=1 ` 
called Taylor’s formula with the remainder term in Peano form. 


8.4.5 Extrema of Functions of Several Variables 


One of the most important applications of differential calculus is its use in 
finding extrema of functions. 


Definition 1. A function f : E > R defined on a set E C R™ has a local 
maximum (resp. local minimum) at an interior point ro of E if there exists 
a neighborhood U(z9) C E of the point xp such that f(x) < f(xo) (resp. 
f(x) > f(xo)) for all x € U (zo). 

If the strict inequality f(x) < f(xo) holds for x € U (xo) \ xo (or, respec- 
tively, f(x) > f(xo)), the function has a strict local maximum (resp. strict 
local minimum) at xo. 


Definition 2. The local minima and maxima of a function are called its local 
extrema. | 


Theorem 5. Suppose a function f : U(xo) —> R defined in a neighborhood 
U(xo) C R™ of the point zo = (xå, ..., 27") has partial derivatives with respect 
to each of the variables z!,...,x™ at the point zo. 

Then a necessary condition for the function to have a local extremum at 
xo is that the following equalities hold at that point: 


ð O 
SE (zo) = 02565 LE (a0) = 0. (8.65) 


Proof. Consider the function y(x!) = f(x',22,..., 2%") of one variable de- 
fined, according to the hypotheses of the theorem, in some neighborhood of 
the point xå on the real line. At xå the function (xt) has a local extremum, 
and since 


Of A 
ol ab) = Saleh ahap), 


it follows that 24(ao) = 0. 
The other equalities in (8.65) are proved similarly. O 


_ We call attention to the fact that relations (8.65) give only necessary but 
not sufficient conditions for an extremum of a function of several variables. 
An example that confirms this is any example constructed for this purpose 
for functions of one variable. Thus, where previously we spoke of the function 
xz ++ x°, whose derivative is zero at zero, but has no extremum there, we can 


464 8 Differential Calculus in Several Variables 


now consider the function 
f(z’, PS or) = (a*)? 9 


all of whose partial derivatives are zero at £o = (0,...,0), while the function 
obviously has no extremum at that point. 

Theorem 5 shows that if the function f : G — R is defined on an open 
set G C R”, its local extrema are found either among the points at which f 
is not differentiable or at the points where the differential df (xo) or, what is 
the same, the tangent mapping f'(xo), vanishes. 

We know that if a mapping f : U(x) — R” defined in a neighborhood 
U(xo) C R” of the point xo € R” is differentiable at xo, then the matrix of 
the tangent mapping f'(xo): R™ — R” has the form 


ðı f (x0) +- mf (xo) 
EEEE (8.66) 


1 f” (x0) --+ mf” (Zo) 


Definition 3. The point xo is a critical point of the mapping f : U (zo) > R” 
if the rank of the Jacobi matrix (8.66) of the mapping at that point is less 
than min{m,n}, that is, smaller than the maximum possible value it can 
have. 


In particular, if n = 1, the point 29 is critical if condition (8.65) holds, 
that is, all the partial derivatives of the function f : U (xo) —> R vanish. 

The critical points of real-valued functions are also called the stationary 
points of these functions. 

After the critical points of a function have been found by solving the 
system (8.65), the subsequent analysis to determine whether they are extrema 
or not can often be carried out using Taylor’s formula and the following 
sufficient conditions for the presence or absence of an extremum provided by 
that formula. 


Theorem 6. Let f : U(xzo) —> R be a function of class C) (U (xo); R) de- 
fined in a neighborhood U (xo) C R™ of the point zo = (xå,... 2%") € R”, 
and let xo be a critical point of the function f. 

If, in the Taylor expansion of the function at the point xo 


flap +n}, ... a +h™) 
lw Of 
2! Oxi Axi 


1,j=1 


= f(r... 20) +5 DD) aaa (z)h h + o(||hl|?\8.67) 
the quadratic form 


of - 
> Irox ZL (ao) hth! = iz f (Lo)h'h’ (8.68) 


1,j=1 
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a) is positive-definite or negative-definite, then the point xo has a local 
extremum at xo, which is a strict local minimum if the quadratic form (8.68) 
is positive-definite and a strict local maximum if it is negative-definite; 

b) assumes both positive and negative values, then the function does not 
have an extremum at Zo. 


Proof. Let h £0 and zo +h € U(azo). Let us represent (8.67) in the form 


1 L e ht hi 
Fao +8) — F) = SIMI? | D> gaged gt 6 
` i, j=1 


where o(1) is infinitesimal as h — 0. 

It is clear from (8.69) that the sign of the difference f(xzo + h) — f(xo) 
is completely determined by the sign of the quantity in brackets. We now 
undertake to study this quantity. 

The vector e = (h!/||k||,...,k™/||h||) obviously has norm 1. The 
quadratic form (8.68) is continuous as a function h € R™, and therefore 
its restriction to the unit sphere S(0;1) = {x € R™| ||x|| = 1} is also contin- 
uous on S(0; 1). But the sphere S' is a closed bounded subset in R”, that is, 
it is compact. Consequently, the form (8.68) has both a minimum point and 
a maximum point on S, at which it assumes respectively the values m and 
M. 

If the form (8.68) is positive-definite, then 0 < m < M, and there is 
a number 6 > 0 such that |o(1)| < m for ||h|| < 6. Then for ||h|| < 6 
the bracket on the right-hand side of (8.69) is positive, and consequently 
f(z£o + h) — f (£o) > 0 for 0 < ||h|| < 6. Thus, in this case the point zo is a 
strict local minimum of the function. 

One can verify similarly that when the form (8.68) is negative-definite, 
the function has a strict local maximum at the point Zo. 

Thus a) is now proved. 

We now prove b). 

Let em and em be points of the unit sphere at which the form (8.68) 
assumes the values m and M respectively, and let m < 0 < M. 

Setting h = tem, where t is a sufficiently small positive number (so small 
that zo + tem € U(x0)), we find by (8.69) that 


f(£o + tem) — f (£o) = = t(m + o(1)) , 


where o(1) — 0 as t — 0. Starting at some time (that is, for all sufficiently 
small values of t), the quantity m+o(1) on the right-hand side of this equality 
will have the sign of m, that is, it will be negative. Consequently, the left-hand 
side will also be negative. 
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Similarly, setting h = tem, we obtain 
1 
f (to + tem) = fo) = 5 t7(M + o(1)) , 


and consequently for all sufficiently small t the difference f (xo +tem)— f (20) 
is positive. 

Thus, if the quadratic form (8.68) assumes both positive and negative 
values on the unit sphere, or, what is obviously equivalent, in R™, then in 
any neighborhood of the point xo there are both points where the value of 
the function is larger than f(z) and points where the value is smaller than 
f (xo). Hence, in that case xp is not a local extremum of the function. O 


We now make a number of remarks in connection with this theorem. 


Remark 1. Theorem 6 says nothing about the case when the form (8.68) is 
semi-definite, that is, nonpositive or nonnegative. It turns out that in this 
case the point may be an extremum, or it may not. This can be seen, in 
particular from the following example. 


Example 3. Let us find the extrema of the function f(x,y) = zt + y* — 227, 
which is defined in R?. 

In accordance with the necessary conditions (8.65) we write the system 
of equations 


Of ae 
az 2Y) = 4r" =r; 


Of ee 
By 9) = 4y? =0, 


from which we find three critical points: (—1,0), (0,0), (1,0). 


Since 
O° f 2 O° f m O° f S. 
gaz hy) 512-4, gg (ey) =0, gae, 


at the three critical points the quadratic form (8.68) has respectively the form 
8(h*)? ’ —4(h*)? ’ 8(h*)* : 


That is, in all cases it is positive semi-definite or negative semi-definite. The- 
orem 6 is not applicable, but since f(x,y) = (x? — 1)? + y* — 1, it is obvious 
that the function f(x,y) has a strict minimum —1 (even a global minimum) 
at the points (—1,0), and (1,0), while there is no extremum at (0,0), since 
for x = 0 and y £ 0, we have f(0, y) = yt > 0, and for y = 0 and sufficiently 
small z 4 0 we have f(z,0) = x4 — 22? < 0. 
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Remark 2. After the quadratic form (8.68) has been obtained, the study of 

its definiteness can be carried out using the Sylvester’ criterion. We recall 
m . . 

that by the Sylvester criterion, a quadratic form ` aijx’z? with symmetric 


ij=l 
matrix 


eosvoeeeeee ee eo o 


is positive-definite if and only if all its principal minors are positive; the form 
is negative-definite if and only if aıı < 0 and the sign of the principal minor 
reverses each time its order increases by one. 


Example 4. Let us find the extrema of the function 
f(x,y) = zyIn(x* + y’) , 


which is defined everywhere in the plane R? except at the origin. 
Solving the system of equations 


Of = E 2z y 
_ ax yIn(x +y PE =0, 
Ir 2 
la, y) = zln(z? 2 EN = 


24 y? 


we find all the critical points of the function 


1 1 1 1 

(0,41); (41,0); (4 tae) ; (4 tae) | 

Since the function is odd with respect to each of its arguments individu- 
ally, the points (0, +1) and (+1,0) are obviously not extrema of the function. 

It is also clear that this function does not change its value when the signs 
of both variables x and y are changed. Thus by studying only one of the 
remaining critical points, for example, (se: Ts) we will be able to draw 
conclusions on the nature of the others. 


Since 
Ary 
a Cu) araa 
x 2o (x? + y?) 
of 4x?y? 
= ln(z? + k? 2 — —— 
oley) n(x T J (x? + y?)? ) 
07 f 6xy Ary? 
ote) = 72 Ie- 2\2 ° 
Oy rT? +y (x? + y2) 


7 J. J. Sylvester (1814-1897) — British mathematician. His best-known works were 
on algebra. 
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at the point (=~, --) the quadratic form 0;; f(xo)h*h? has the matrix 


V2e’ V/2e 
2 0 
Og 


that is, it is positive-definite, and consequently at that point the function has 
a local minimum | 


f ( 1 1 ) = 1 
V2e /2e 2e 
By the observations made above on the properties of this function, one 
can conclude immediately that 


is also a local minimum and 


O EE ae ae ee 
V2e Ve V2e V2e/ 2e 
are local maxima of the function. This, however, could have been verified 


directly, by checking the definiteness of the corresponding quadratic form. 
For example, at the point ( — Te Ta) the matrix of the quadratic form 


(8.68) has the form 
—2 0 
0 —2) ’ 


from which it is clear that it is negative-definite. 


Remark 3. It should be kept in mind that we have given necessary conditions 
(Theorem 5) and sufficient conditions (Theorem 6) for an extremum of a 
function only at an interior point of its domain of definition. Thus in seeking 
the absolute maximum or minimum of a function, it is necessary to examine 
the boundary points of the domain of definition along with the critical interior 
points, since the function may assume its maximal or minimal value at one 
of these boundary points. 


The general principles of studying noninterior extrema will be considered 
in more detail later (see the section devoted to extrema with constraint). It is 
useful to keep in mind that in searching for minima and maxima one may use 
certain simple considerations connected with the nature of the problem along 
with the formal techniques, and sometimes even instead of them. For example, 
if a differentiable function being studied in R™ must have a minimum because 
of the nature of the problem and turns out to be unbounded above, then 
if the function has only one critical point, one can assert without further 
investigation that that point is the minimum. 
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Example 5. Huygens’ problem. On the basis of the laws of conservation of 
energy and momentum of a closed mechanical system one can show by a 
simple computation that when two perfectly elastic balls having mass mı 
and mg, and initial velocities vı and v2 collide, their velocities after a central 
collision (when the velocities are directed along the line joining the centers) 
are determined by the relations 


i, = (mı — M2)v1 + 2mevo 
Mı + Mə 

= (m2 — m4)v2 + 2m, v1 
Mm, + M2 


In particular, if a ball of mass M moving with velocity V strikes a mo- 
tionless ball of mass m, then the velocity v acquired by the latter can be 
found from the formula T 

v= nIM (8.70) 
from which one can see that if 0 < m < M, then V < v < 2V. 

How can a significant part of the kinetic energy of a larger mass be com- 
municated to a body of small mass? To do this, for example, one can insert 
balls with intermediate masses between the balls of small and large mass: 
m < Mı < M <: < Mn < M. Let us compute (after Huygens) how the 
masses M1, M2,..., Mn Should be chosen to that the body m will acquire 
maximum velocity after successive central collisions. 

In accordance with formula (8.70) we obtain the following expression for 
the required velocity as a function of the variables m1, mo, ..., Mn: 


_ 7m mM y., (8.71) 
M+tm, m +m Mn—-1 Mn Mn +M 


Thus Huygens’ problem reduces to finding the maximum of the function 


mı Mn M 


Mise M = — tt 8 = o —— , 
Pma n) m + mı Mn—-1 Mn Mnr +M 


The system of equations (8.65), which gives the necessary conditions for 
an interior extremum, reduces to the following system in the present case: 


ooeceo ere oe ee ee wee we ooo ooo 


from which it follows that the numbers m, m1,..., Mn, M form a geometric 
progression with ratio q equal to "*\/M/m. 
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The value of the velocity (8.71) that results from this choice of masses is 


given by 

2q n+l1 

j= (=) V, (8.72) 
1+q 

which agrees with (8.70) if n = 0. 

It is clear from physical considerations that formula (8.72) gives the max- 
imal value of the function (8.71). However, this can also be verified formally 
(without invoking the cumbersome second derivatives. See Problem 9 at the 
end of this section). 

We remark that it is clear from (8.72) that if m — 0, then v > 2"+1YV, 
Thus the intermediate masses do indeed significantly increase the portion of 
the kinetic energy of the mass M that is transmitted to the small mass m. 


8.4.6 Some Geometric Images Connected 
with Functions of Several Variables 


a. The Graph of a Function and Curvilinear Coordinates Let zx, 
y, and z be Cartesian coordinates of a point in R3 and let z = f(x,y) be a 
continuous function defined in some domain G of the plane R? of the variables 
x and y. 

By the general definition of the graph of a function, the graph of the 
function f : G — R in our case is the set S = {(z,y,z) € R°|(x,y) € G, z = 
f(x,y)} in the space R°. 


It is obvious that the mapping G E, S defined by the relation (x,y) > 
(x, Y, f(x, y)) is a continuous one-to-one mapping of G onto S, by which one 
can determine every point of S by exhibiting the point of G corresponding 
to it, or, what is the same, giving the coordinates (x,y) of this point of G. 

Thus the pairs of numbers (x,y) € G can be regarded as certain coordi- 
nates of the points of a set S — the graph of the function z = f(x,y). Since 
the points of S are given by pairs of numbers, we shall conditionally agree to 
call S a two-dimensional surface in RÌ. (The general definition of a surface 
will be given later.) 

If we define a path I’: I — G in G, then a path Fol’: I > S auto- 
matically appears on the surface S. If x = x(t) and y = y(t) is a parametric 
definition of the path I’, then the path F'o I on S is given by the three 
functions x = x(t), y = y(t), z = z(t) = f (z(t), y(t). In particular, if we set 
x= £o +t, y= yo, we obtain a curve x = zo +t, Y = yo, z = f (Xo + t, yo) on 
the surface S along which the coordinate y = yo of the points of S does not 
change. Similarly one can exhibit a curve x = zo, y = yo +t, z = f(X0, yo +t) 
on S along which the first coordinate xo of the points of S does not change. 
By analogy with the planar case these curves on S are naturally called coor- 
dinate lines on the surface S. However, in contrast to the coordinate lines in 
G C R’, which are pieces of straight lines, the coordinate lines on S are in 
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general curves in R3. For that reason, the coordinates (x,y) of points of the 
surface S are often called curvilinear coordinates on S. 

Thus the graph of a continuous function z = f(x,y), defined in a domain 
G C R? is a two-dimensional surface S in R whose points can be defined by 
curvilinear coordinates (x,y) € G. 

At this point we shall not go into detail on the general definition of a 
surface, since we are interested only in a special case of a surface — the graph 
of a function. However, we assume that from the course in analytic geometry 
the reader is well acquainted with some important particular surfaces in R3 
(such as a plane, an ellipsoid, paraboloids, and hyperboloids). 


b. The Tangent Plane to the Graph of a Function Differentiability of 
a function z = f(x,y) at the point (£o, yo) € G means that 


f(x,y) = f(z0, yo) + A(z — Zo) + Bly — yo) + 


+o(y/ (x — xo)? + (y — yo)?) as (x,y) + (0, yo), (8-73) 


where A and B are certain constants. 
In R let us consider the plane 


z = zo + A(x — zo) + Bly — yo) , (8.74) 


where zo = f (£o, yo). Comparing equalities (8.73) and (8.74), we see that the 
graph of the function is well approximated by the plane (8.74) in a neigh- 
borhood of the point (xo, yo, zo). More precisely, the point (x,y, f (x,y)) of 
the graph of the function differs from the point (x, y, z(x,y)) of the plane 
(8.74) by an amount that is infinitesimal in comparison with the magni- 
tude \/(x — xo)? + (y — yo)? of the displacement of its curvilinear coordinates 
(x,y) from the coordinates (xo, yo) of the point (£o, yo, zo). 

By the uniqueness of the differential of a function, the plane (8.74) pos- 
sessing this property is unique and has the form 


z= f(Xo, yo) + ÎI ro, uo) 2 — Xo) + L (x0, vo) (u — Yo) - (8.75) 


This plane is called the tangent plane to the graph of the function z = f(x,y) 
at the point (xo, Yo, f (zo, yo)) : 

Thus, the differentiability of a function z = f(x,y) at the point (xo, yo) 
and the existence of a tangent plane to the graph of this function at the point 
(xo, yo, f (£0, yo)) are equivalent conditions. 


c. The Normal Vector Writing Eq. (8.75) for the tangent plane in the 
canonical form 


L (zo, uo) (2 — 20) + SE (z0, ¥0)(y — vo) = (z = F(eo,40)) = 9, 
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Z 


we conclude that the vector 


(ŽE (wo, vo), $4 (ao, vo), —1) (8.76) 
is the normal vector to the tangent plane. Its direction is considered to be the 
direction normal or orthogonal to the surface S (the graph of the function) 
at the point (zo, yo, f (Zo, Yo))- 

In particular, if (xo, yo) is a critical point of the function f(x,y), then 
the normal vector to the graph at the point (xo, yo, f (£o, yo)) has the form 
(0,0, —1) and consequently, the tangent plane to the graph of the function at 
such a point is horizontal (parallel to the xy-plane). 

The three graphs in Fig. 8.1 illustrate what has just been said. 

Figures 8.la and c depict the location of the graph of a function with 
respect to the tangent plane in a neighborhood of a local extremum (min- 
imum and maximum respectively), while Fig. 8.1b shows the graph in the 
neighborhood of a so-called saddle point. 


d. Tangent Planes and Tangent Vectors We know that if a path I’: 
I — R in R is given by differentiable functions x = x(t), y = y(t), z = z(t), 
then the vector ((0), (0), 2(0)) is the velocity vector at time t = 0. It is a 
direction vector of the tangent at the point zo = x(0), yo = y(0), zo = 2(0) 
to the curve in R? that is the support of the path I. 

Now let us consider a path J’: I — S on the graph of a function z = 
f(x,y) given in the form x = x(t), y = y(t), z = f(z(t), y(t)). In this 
particular case we find that © 


(ż(0), (0), 2(0)) = (#(0),9(0), ÎI (ro, yo) (0) + L (wo, vo) (0) 


from which it can be seen that this vector is orthogonal to the vector (8.76) 
normal to the graph S of the function at the point (xo, yo, f (Zo, yo)). Thus 
we have shown that if a vector (£, nņ,Ç) is tangent to a curve on the surface S 
at the point (xo, yo, f (Lo, yo)) then it is orthogonal to the vector (8.76) and 
(in this sense) lies in the plane (8.75) tangent to the surface S at the point 
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in question. More precisely we could say that the whole line x = zo + &t, 
Y = Yo + Nt, z = f (£0, yo) + Çt lies in the tangent plane (8.75). 

Let us now show that the converse is also true, that is, if a line x = 79 +t, 
Y = Yo + nt, z = f(xo0, yo) + Ct, or what is the same, the vector (€, n, Ç), lies 
in the plane (8.75), then there is a path on S for which the vector (£,7, ¢) is 
the velocity vector at the point (xo, yo, f (Xo, yo))- 

The path can be taken, for example, to be 


L=X%t+ él, y=yot nN, z= f(to + Et, yo + nt) . 
In fact, for this path, 


HO)=€,  HO=m, — 0) = É (aooe + ZZ (vo, ¥0)n. 


In view of the equality 


S (aos vo)#(0) + ĈE (æo, yo)4(0) — 20) = 0 


and the hypothesis that 


FE (0, ole + S (no, vo)” _ Ç =0 ) 


we conclude that 
(%(0), ġ(0), 2(0)) = (€,,¢) . 


Hence the tangent plane to the surface S at the point (xo, yo, zo) is formed 
by the vectors that are tangents at the point (x9, yo, 20) to curves on the 
surface S passing through the point (see Fig. 8.2). 


Fig. 8.2. 


This is a more geometric description of the tangent plane. In any case, 
one can see from it that if the tangent to a curve is invariantly defined (with 
respect to the choice of coordinates), then the tangent plane is also invariantly 
defined. 
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We have been considering functions of two variables for the sake of visu- 
alizability, but everything that was said obviously carries over to the general 
case of a function 

y= FO r") (8.77) 
of m variables, where m € N. 

At the point (x9, shasta war (lowes an)) the plane tangent to the graph 

of such a function can be written in the form 


y= eee +> FF ah, apaa) (8.78) 


the vector 


o O 
(<4 (a0), e.) AL (a0), -1) 


is the normal vector to the plane (8.78). This plane itself, like the graph of 
the function (8.77), has dimension m, that is, any point is now given by a set 
(x1,...,2™) of m coordinates. 

This: Eq. (8.78) defines a hyperplane in R™T?. 

Repeating verbatim the reasoning above, one can verify that the tangent 
plane (8.78) consists of vectors that are tangent to curves passing through 
the point (xj,...,20', f(xj,.--,28')) and lying on the m-dimensional surface 
S — the graph of the function (8.77). 


8.4.7 Problems and Exercises 


1. Let z = f(x,y) be a function of class C“(G; R). 

a) If By (a, y) = 0 in G, can one assert that f is independent of y in G? 

b) Under what condition on the domain G does the preceding question have an 
affirmative answer? 


2. a) Verify that for the function 
ryss , ifr +y £0, 
f(z, y) = 
0, ifa?+y?=0, 
the following relations hold: 


ee 


of ee 
Ba py (09) = 1 # 1 = 5 -(0,0). 


b) Prove that if the function f(x,y) has ‘eau derivatives oe nd in some 
neighborhood U of the point (xo, yo), and if the mixed derivative 24 a > ZL) 


exists in U and is continuous at (zo, yo), then the mixed derivative a d- (resp. 


a: o£) also exists at that point and the following equality holds: 


of E 3f 
Jzoy 7% Yo) = Iyoz To Y) ; 
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3. Let x!,...,x£™ be Cartesian coordinates in R™. The differential operator 
m 
8? 
= 2 Oxi? ’ 
i= 


acting on functions f € C) (G; R) according to the rule 
> Off a 
Af = —z(@,...,2"), 
fay OR" 


is called the Laplacian. 
The equation Af = 0 for the function f in the domain G C R” is called 
Laplace’s equation, and its solutions are called harmonic functions in the domain G. 


a) Show that if z = (#',...,2™) and 


|r || = 
then for m > 2 the function EN 
f(x) = |e z 
is harmonic in the domain R™ \ 0, where 0 = (0,...,0). 


b) Verify that the function 


a m 1 læ]? 
eee t — SS oe — 
ae) (2aVnt)™ exp ( Aa2t } ’ 


which is defined for t > 0 and z = (z’,..., 2") € R”, satisfies the heat equation 


Of _ 2 
a =O OF 


2 
that is, verify that oe =a’ os at each point of the domain of definition of the 


— 


—_ 


function. 


4. Taylor’s formula in multi-index notation. The symbol a := (a1,...,Q@m) con- 
sisting of nonnegative integers a;, 2 = 1,...,m, is called the multi-indezx a. 
The following notation is conventional: 


la] := ai +--:-+am, 


a! := ai! Qm! ; 


finally, if a = (a1,...,@m), then 


a” = as “a,” 
a) Verify that if k € N, then 
k k! 1 Am 
(a1 +++: + am) a i aaa am» 


ja|=k 
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or 
k k! 
(ai +: +am) = Y e 
ja|=k 

where the summation extends over all sets a = (Q1,...,@m) of nonnegative integers 

m 
such that 5° a; = k. 

i=1 

b) Let 
alel! f 


DG a e a 
Show that if f € C™ (G; R), then the equality 
t i k! a a 
S  Cpif@irione= >. aD f(a)h™ , 
ilt +im=k la|=k 


where h = (h’,...,h™), holds at any point z € G. 


c) Verify that in multi-index notation Taylor’s theorem with the Lagrange form 
of the remainder, for example, can be written as 


n—l 
f(a+h)= ` = D° f(x)h® + ` = D* f(x + 0h)h 
laj=0 laj=n —" 


d) Write Taylor’s formula in multi-index notation with the integral form of the 
remainder (Theorem 4). 


5. a) Let I™ = {x = (z',...,2") E€ R” ||| < £, i = 1,...,m} be an m- 
dimensional closed interval and J a closed interval [a,b] C R. Show that if the 
function f(x,y) = f(x',...,2™, y) is defined and continuous on the set J x J, then 


for any positive number € > 0 there exists a number 6 > 0 such that |f(z,y1) — 
f(x, y2)|<eifxel™, yi, ye E I, and |yi — y2| < ô. 


b) Show that the function 


F(z) = | Fæ) 


is defined and continuous on the closed interval I™. 
c) Show that if f e C(I™; R), then the function 


F (ast) = fits) 


is defined and continuous on J” x I’, where I’ = {t € R| |t| < 1}. 
d) Prove Hadamard’s lemma: 


If f e CYVUI™ ;R) and f(0) = 0, there exist functions gi,...,gm E€ C(I™;R) 
such that 


f(z’,...,2") = Vege iat) 
i=l 
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in I™, and in addition 


aoe oF (0), a ee 


6. Prove the following generalization of Rolle’s theorem for functions of several 
variables. 


If the function f is continuous in a closed ball B(0;r), equal to zero on the 
boundary of the ball, and differentiable in the open ball B(0;r), then at least one of 
the points of the open ball 1s a critical point of the function. 


7. Verify that the function 
f(x,y) = (y — 2°)(y — 32”) 


does not have an extremum at the origin, even though its restriction to each line 
passing through the origin has a strict local minimum at that point. 


8. The method of least squares. This is one of the commonest methods of processing 
the results of observations. It consists of the following. Suppose it is known that 
the physical quantities x and y are linearly related: 


y=art+b (8.79) 


or suppose an empirical formula of this type has been constructed on the basis of 
experimental data. 

Let us assume that n observations have been made, in each of which both 
x and y were measured, resulting in n pairs of values 71, 413...;X%n,Yn. Since the 
measurements have errors, even if the relation (8.79) is exact, the equalities 


Yk = ar, +b 


may fail to hold for some of the values of k € {1,...,n}, no matter what the 
coefficients a and b are. 

The problem is to determine the unknown coefficients a and b in a reasonable 
way from these observational results. 

Basing his argument on analysis of the probability distribution of the magnitude 
of observational errors, Gauss established that the most probable values for the 
coefficients a and b with a given set of observational results should be sought by 
use of the following least-squares principle: 


If ôk = (azk +b) — yk ts the discrepancy in the kth observation, then a and b 
should be chosen so that the quantity 


Aye 
k=1 


that 1s, the sum of the squares of the discrepancies, has a minimum. 


a) Show that the least-squares principle for relation (8.79) leads to the following 
system of linear equations 
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(1, Tp]a + (1, 1]b = (1, Yk], 


for determining the coefficients a and b. Here, following Gauss, we write |£, £k] := 
£421 +++-+2nEn, (ex, 1] := x1 1+: + an 1, (ex, yk] = tiyr + + EnYn, and 
so forth. 

b) Write the system of equations for the numbers a1,...,a@m,b to which the 
least-squares principle leads when Eq. (8.79) is replaced by the relation 


m 
y = S a +b, 
i=1 
1 


(or, more briefly, y = aix? + b) between the quantities x’,...,2™ and y. 
c) How can the method of least squares be used to find empirical formulas of 
the form 
QQ) An 
Y=CL, ++ Ly 
connecting physical quantities 71,...,2%m with the quantity y? 


d) (M. Germain.) The frequency R of heart contractions was measured at dif- 
ferent temperatures T in several dozen specimens of Nereis diversicolor. The fre- 
quencies were expressed in percents relative to the contraction frequency at 15° C. 
The results are given in the following table. 


Temperature, ° C Frequency, % Temperature, ° C Frequency, % 
0 39 20 136 
5 54 25 182 
10 74 30 254 
15 100 


The dependence of R on T appears to be exponential. Assuming R = Ae’, 
find the values of the constants A and b that best fit the experimental results. 


9. a) Show that in Huygens’ problem, studied in Example 5, the function (8.71) 
tends to zero if at least one of the variables m1,...,Mn tends to infinity. 


b) Show that the function (8.71) has a maximum point in R” and hence the 
unique critical point of that function in R” must be its maximum. 


c) Show that the quantity v defined by formula (8.72) is monotonically increas- 
ing as n increases and find its limit as n — oo. 


10. a) During so-called exterior disk grinding the grinding tool — a rapidly rotating 
grinding disk (with an abrasive rim) that acts as a file — is brought into contact 
with the surface of a circular machine part that is rotating slowly compared with 
the disk (see Fig. 8.3). 

The disk K is gradually pressed against the machine part D, causing a layer 
H of metal to be removed, reducing the part to the required size and producing 
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Fig. 8.3. 


a smooth working surface for the device. In the machine where it will be placed 
this surface will usually be a working surface. In order to extend its working life, 
the metal of the machine part is subjected to a preliminary annealing to harden 
the steel. However, because of the high temperature in the contact zone between 
the machine part and the grinding disk, structural changes can (and frequently 
do) occur in a certain layer A of metal in the machine part, resulting in decreased 
hardness of the steel in that layer. The quantity A is a monotonic function of the 
rate s at which the disk is applied to the machine part, that is, A = ọ(s). It is 
known that there is a certain critical rate so > 0 at which the relation A = 0 still 
holds, while A > 0 whenever s > so. For the following discussion it is convenient 
to introduce the relation 
s=y(A) 

inverse to the one just given. This new relation is defined for A > 0. 

Here w is a monotonically increasing function known experimentally, defined 
for A > 0, and (0) = so > 0. 

The grinding process must be carried out in such a way that there are no 
structural changes in the metal on the surface eventually produced. 

In terms of rapidity, the optimal grinding mode under these conditions would 
obviously be a set of variations in the rate s of application of the grinding disk for 
which 

s = (6) , 
where 6 = 6(t) is the thickness of the layer of metal not yet removed up to time t, 
or, what is the same, the distance from the rim of the disk at time t to the final 
surface of the device being produced. Explain this. 


b) Find the time needed to remove a layer of thickness H when the rate of 
application of the disk is optimally adjusted. 

c) Find the dependence s = s(t) of the rate of application of the disk on time 
in the optimal mode under the condition that the function A Ys s is linear: 

= so + àA. 

Due to the structural properties of certain kinds of grinding lathes, the rate 
s can undergo only discrete changes. This poses the problem of optimizing the 
productivity of the process under the additional condition that only a fixed number 
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n of switches in the rate s are allowed. The answers to the following questions give 
a picture of the optimal mode. 
H 


d) What is the geometric interpretation of the grinding time ¢t(H) = S wo that 
0 


you found in part b) for the optimal continuous variation of the rate s? 


e) What is the geometric interpretation of the time lost in switching from the 
optimal continuous mode of variation of s to the time-optimal stepwise mode of 
variation of s? 

f) Show that the points 0 = £n+1 < En < +- < 41 < Zo = H of the closed 
interval [0, H] at which the rate should be switched must satisfy the conditions 


a 

P(zi41) (zi) 
and consequently, on the portion from z; to x;41, the rate of application of the disk 
has the form s = ~(xi41) (i = 0,...,7). 


g) Show that in the linear case, when 7(A) = so + AA, the points x; (in part 
f)) on the closed interval [0, H] are distributed so that the numbers 


-(F) ete aie (eds coi) 


So So So SO 
ae ee Se E = +H 
x < y TIn < < ` £1 < ` 


form a geometric progression. 


11. a) Verify that the tangent to a curve I’: I — R” is defined invariantly relative 
to the choice of coordinate system in R”. 

b) Verify that the tangent plane to the graph S of a function y = f(z’,...,2™) 
is defined invariantly relative to the choice of coordinate system in R”. 

c) Suppose the set S C R™ x R! is the graph of a function y = f (xt, aara Jin 
coordinates (x!,...,x™,y)in R” xR’ and the graph of a function 9 = f(Z',..., 2”) 
in coordinates (Z',...,2,9) in R™ x R’. Verify that the tangent plane to S is 
invariant relative to a linear change of coordinates in R™ x R’. 


d) Verify that the Laplacian Af = 2 Zh (a) is defined invariantly relative to 


orthogonal coordinate transformations i z R”. 


8.5 The Implicit Function Theorem 


8.5.1 Statement of the Problem and Preliminary Considerations 


In this section we shall prove the implicit function theorem, which is impor- 
tant both intrinsically and because of its numerous applications. 
Let us begin by explaining the problem. Suppose, for example, we have 


the relation 
r? +y*-1=0 (8.80) 


between the coordinates x,y of points in the plane R?. The set of all points 
of R? satisfying this condition is the unit circle (Fig. 8.4). 
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Fig. 8.4. 


The presence of the relation (8.80) shows that after fixing one of the 
coordinates, for example, x, we can no longer choose the second coordinate 
arbitrarily. Thus relation (8.80) determines the dependence of y on x. We are 
interested in the question of the conditions under which the implicit relation 
(8.80) can be solved as an explicit functional dependence y = y(x). 

Solving Eq. (8.80) with respect to y, we find that 


y=+vV1-2?, (8.81) 


that is, to each value of x such that |x| < 1, there are actually two admissible 
values of y. In forming a functional relation y = y(x) satisfying relation (8.80) 
one cannot give preference to either of the values (8.81) without invoking 
additional requirements. For example, the function y(x) that assumes the 
value +1 — x? at rational points of the closed interval [—1, 1] and the value 
— V1 — x? at irrational points obviously satisfies (8.80). 

It is clear that one can create infinitely many functional relations satisfy- 
ing (8.80) by varying this example. 

The question whether the set defined in R? by (8.80) is the graph of a 
function y = y(x) obviously has a negative answer, since from the geometric 
point of view it is equivalent to the question whether it is possible to establish 
a one-to-one direct projection of a circle into a line. 

But observation (see Fig. 8.4) suggests that nevertheless, in a neighbor- 
hood of a particular point (£o, yo) the arc projects in a one-to-one manner 
into the x-axis, and that it can be represented uniquely as y = y(x), where 
x > y(x) is a continuous function defined in a neighborhood of the point 
Xo and assuming the value yo at xo. In this aspect, the only bad points are 
(—1,0) and (1,0), since no arc of the circle having them as interior points 
projects in a one-to-one manner into the x-axis. Even so, neighborhoods of 
these points on the circle are well situated relative to the y-axis, and can 
be represented as the graph of a function x = z(y) that is continuous in a 
neighborhood of the point 0 and assumes the value —1 or 1 according as the 
arc in question contains the point (—1,0) or (1,0). 
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How is it possible to find out analytically when a geometric locus of points 
defined by a relation of the type (8.80) can be represented in the form of an 
explicit function y = y(x) or x = z(y) in a neighborhood of a point (2p, yo) 
on the locus? 

We shall discuss this question using the following, now familiar, method. 
We have a function F(x,y) = z? +y? —1. The local behavior of this function 
in a neighborhood of a point (xo, yo) is well described by its differential 


F(20; Yyo)(x E ro) + F; (£0, yo) (y g Yo) ’ 


since 


F(x,y) = F(x£o, yo) sfe F! (x0, yo)(x = £o) F 
+ F;(£0, yo)(y — yo) + o(|z — zo| + |y — vol) 


as (x,y) a (Xo, Yo). 
If F(x, yo) = 0 and we are interested in the behavior of the level curve 


F(z, y) =0 


of the function in a neighborhood of the point (x9, yo), we can judge that 
behavior from the position of the (tangent) line 


F (Zo, Yo) (£ — Xo) + Fy (Xo, yo)(y — Yo) = 0. (8.82) 


If this line is situated so that its equation can be solved with respect to 
y, then, since the curve F(x,y) = 0 differs very little from this line in a 
neighborhood of the point (£o, yo), we may hope that it also can be written 
in the form y = y(x) in some neighborhood of the point (£o, yo). 

The same can be said about local solvability of F(x,y) = 0 with respect 
to x. 

Writing Eq. (8.82) for the specific relation (8.80), we obtain the following 
equation for the tangent line: 


zolz — Lo) + yoly — yo) = 0. 


This equation can always be solved for y when yo Æ 0, that is, at all points 
of the circle (8.80) except (—1,0) and (1,0). It is solvable with respect to x 
at all points of the circle except (0,—1) and (0,1). 


8.5.2 An Elementary Version of the Implicit Function Theorem 


In this section we shall obtain the implicit function theorem by a very in- 
tuitive, but not very constructive method, one that is adapted only to the 
case of real-valued functions of real variables. The reader can become famil- 
iar with another method of obtaining this theorem, one that is in many ways 
preferable, and with a more detailed analysis of its structure in Chap. 10 
(Part 2), and also in Problem 4 at the end of the section. 

‘The following proposition is an elementary version of the implicit function 
theorem. 
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Proposition 1. If the function F : U(x, yo) > R defined in a neighborhood 
U(x, yo) of the point (xo, yo) € R? is such that 


1° F e C®)(U;R), where p > 1, 
2° F(x, yo) = 0, 
3° F, (£0, Yo) £ 0, 
then there exist a two-dimensional interval I = I; x Iy where 


I, = {x € R| |£ — zo| < a}, I, = {y € R| ly — yol < B}, 


that is a neighborhood of the point (£o, yo) contained in U (xo, yo), and a 
function f € C®)(I,;Iy) such that 


F(x,y) =0 & y= f(z), (8.83) 


for any point (x,y) € I; x Iy and the derivative of the function y = f(x) at 
the points x € I, can be computed from the formula 


f'(e) = -[F} (x, f(x))]~* [Fi (a, f(£))] - (8.84) 


Before taking up the proof, we shall give several possible reformulations 
of the conclusion (8.83), which should bring out the meaning of the relation 
itself. 

Proposition 1 says that under hypotheses 1°, 2°, and 3° the portion of 
the set defined by the relation F(x,y) = 0 that belongs to the neighborhood 
I, x I, of the point (zo, yo) is the graph of a function f : I, — I, of class 
C’) (In; Iy). 

In other words, one can say that inside the neighborhood I of the point 
(Xo, Yo) the equation F(x,y) = 0 has a unique solution for y, and the function 
y = f(z) is that solution, that is, F(x, f(z)) =0 on Iz. 

It follows in turn from this that if y = f(x) is a function defined on I, that 
is known to satisfy the relation F(z, f(x)) = 0 on Is, f(£o) = yo, and this 
- function is continuous at the point zo € Iz, then there exists a neighborhood 
A C I, of xp such that f(A) C Iy, and then f(x) = f(x) for x € A. 

Without the assumption that the function f is continuous at the point 
xo and the condition f(xo) = yo, this last conclusion could turn out to be 
incorrect, as can be seen from the example of the circle already studied. 

Let us now prove Proposition 1. 


Proof. Suppose for definiteness that Fy (£0, yo) > 0. Since F € CY (U;R), 
it follows that Fy (x,y) > 0 also in some neighborhood of (x0, yo). In order 
to avoid introducing new notation, we can assume without loss of generality 
that Fy (x,y) > 0 at every point of the original neighborhood U (zo, yo). 

Moreover, shrinking the neighborhood U (zo, yo) if necessary, we can as- 
sume that it is a disk of radius r = 2G > 0 with center at (Zo, yo). 
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Since F, (x,y) > Oin U, the function F'(xo, y) is defined and monotonically 
increasing as a function of y on the closed interval yo —-G < y < Yo + B. 
Consequently, 


F (x0, yo — B) < F(x0, yo) =0 < F(X0, yo + 8). 


By the continuity of the function F in U, there exists a positive number 
a < B such that the relations 


F(z, yo — B) < 0 < F(z, yo + 8) 


hold for |x — zo| < a. 
We shall now show that the rectangle J = I; x Iy, where 


I, = {x E€ R| |z - zo| <a}, Iy={yeR||y—y)| < 8} , 


is the required two-dimensional interval in which relation (8.83) holds. 

For each x € I, we fix the vertical closed interval with endpoints (x, yo — 
B), (x, yo + B). Regarding F(x,y) as a function of y on that closed interval, 
we obtain a strictly increasing continuous function that assumes values of 
opposite sign at the endpoints of the interval. Consequently, for each x € Iz, 
there is a unique point y(x) € Iy such that F(z, y(x)) = 0. Setting y(x) = 
f(x), we arrive at relation (8.83). 

We now establish that f € C(I»; Iy). 

We begin by showing that the function f is continuous at x9 and that 
f (xo) = yo. This last equality obviously follows from the fact that for x = x 
there is a unique point y(xo) € Iy such that F(xo,y(zo)) = 0. At the same 
time, Fo, yo) = 0, and so f (Zo) = Yo. 

Given a number €, 0 < £ < £, we can repeat the proof of the existence 
of the function f(x) and find a number 6, 0 < 6 < a such that in the two- 
dimensional interval J = I, x Iy, where 


I, = {x € R| |z — zo| < ô} , Iy = {y € R| |y - vol < €} , 


the relation g i . 
(F(x,y) =0 in I) & (y = f(a), z € Iz) (8.85) 


holds with a new function f : I, > Iy. 

But I, C dass I, C Iy, and Ea I, and therefore it follows from (8.83) 
and (8.85) that f(x) = f(x) for x € Iz C Is. We have thus verified that 
| f(x) — f(xo)| = |f (£) — yo| < £ for |x — zo| < ô. 

We have now established that the function f is continuous at the point 
xo. But any point (x,y) € I at which F(x,y) = 0 can also be taken as 
the initial point of the construction, since conditions 2° and 3° hold at that 
point. Carrying out that construction inside the interval J, we would once 
again arrive via (8.83) at the corresponding part of the function f considered 
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in a neighborhood of x. Hence the function f is continuous at x. Thus we 
have established that f € Cz; Iy). 
We shall now show that f € C“)(I,;I,) and establish formula (8.84). 
Let the number Az be such that x+ Az € Iz. Let y = f(x) and y+ Ay = 
f(x + Az). Applying the mean-value theorem to the function F(x,y) inside 
the interval J, we find that 


0 = F(x + Ax, f(x + Az)) = F(x, f(x)) = 
= F(x + Az, y+ Ay) — F(x,y) = 
= F(x + 0Az,y + 0Ay) Az + F (x + 04x, y + 0Ay) Ay (0<0<1), 


from which, taking account of the relation Fy (x,y) #0 in I, we obtain 


Ay _ Fi (x+6Az,y + 04y) 


Ar Fi(a+0Az,y + OAy) - eee) 


Since f € C(Is; Iy), it follows that Ay —> 0 as Ax — 0, and, taking 
account of the relation F € C@)(U;R), as Ax — 0 in (8.86), we obtain 


1 = F(x,y) 
Tu= “Gy. 


where y = f(x). Thus formula (8.84) is now established. 
By the theorem on continuity of composite functions, it follows from for- 
mula (8.84) that f e C (Iz; Iy). 
If F e C®)(U;R), the right-hand side of formula (8.84) can be differenti- 
ated with respect to x, and we find that 
ny (Ele + PS, POUR, — FURY, + FY, (a) , 
f (2) = (Fp ,  (8.84’) 


where F}, Fy, Fi, Fiy, and Fy’, are all computed at the point (x, f(x)). 

Thus f € “CO (Ty Iy) if F € C)(U;R). Since the order of the derivatives 
of f on the right-hand side of (8.84), (8.84’), and so forth, is one less than 
the order on the left-hand side of the equality, we find by induction that 


f € CP) (Iz; Iy) if FeC®(U;R). O 


Example 1. Let us return to relation (8.80) studied above, which defines a 
circle in R?, and verify Proposition 1 on this example. 
In this case 
F(z,y)=a°+y°—-1, 


and it is obvious that F € C(%)(R?; R). Next, 
Fi(z.y)=22, F,(z,y)=2y, 


so that Fy (x,y) #0 if y 4 0. Thus, by Proposition 1, for any point (xo, yo) of 
this circle different from the points (—1,0) and (1, 0) there is a neighborhood 
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such that the arc of the circle contained in that neighborhood can be written 
in the form y = f(x). Direct computation confirms this, and f(x) = v1 — x? 
or f(x) = —V1—-2?. 

Next, by Proposition 1, 


/ — F; (£0, yo) = To 
f (£o) = “Elany Bo (8.87) 


Direct computation yields 


T ; 
sa if f(z) = V1 —- 2? 


= if f(z) = -v1 - x°, 


1 — x? 


f(x) = 


which can be written as the single expression 


x x 
f'(x) =a et LG 
f(z) y 
and computation with it leads to the same result, 
To 
O = -— 
(xo) sae 


as computation from formula (8.87) obtained from Proposition 1. 

It is important to note that formula (8.84) or (8.87) makes it possible 
to compute f'(x) without even having an explicit expression for the relation 
y = f(x), if only we know that f(xo) = yo. The condition yo = f(x) must 
be prescribed, however, in order to distinguish the portion of the level curve 
F(x,y) = 0 that we intend to describe in the form y = f(z). 

It is clear from the example of the circle that giving only the coordinate 
Xo does not determine an arc of the circle, and only after fixing yo have we 
distinguished one of the two possible arcs in this case. 


8.5.3 Transition to the Case of a Relation F(z!,...,2”,y) = 0 


The following proposition is a a simple generalization of Proposition 1 to the 
case of a relation F(z',...,2™,y) =0. 


Proposition 2. If a function F : U > R defined in a neighborhod U C R™*! 
of the point (xo, yo) = (x$, ..., £2", yo) E R™*! is such that 

1° FE C®)(U;R), p> 1, 

2° F(20, yo) = F(xj,..., 2%, yo) = 0, 

3° Fi(xo, yo) = Fy(x9,---, 20°, yo) # 0, 


then there exists an (m + 1)-dimensional interval I = I x I}, where 
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Palpata VER |e asa i = an 
I} = {y E€ R| |y — yol < 5}, 
which is a neighborhood of the point (xo, yo) contained in U, and a function 
f E€ C (IX; I}) such that for any point (x,y) € I x I} 
F(z',...,27,y) =0 © y = f(z',...,2™), (8.88) 
and the partial derivatives of the function y € f(x!,...,x£™) at the points of 
I, can be computed from the formula 


Of =i 

OF (a) = — [F5 (e, FE] F} (e, FE) - (8.89) 
Proof. The proof of the existence of the interval I™+! = I” x I} and the 
existence of the function y = f(x) = f(z',...,2™) and its continuity in I 


is a verbatim repetition of the corresponding part of the proof of Proposition 
1, with only a single change, which reduces to the fact that the symbol x 
must now be interpreted as (x!,...,2™) and a as (a!,..., a”). 

If we now fix all the variables in the functions F(z',...,2™”,y) and 
f(z',...,2™) except xt and y, we have the hypotheses of Proposition 1, 
where now the role of x is played by the variable x’. Formula (8.89) follows 
from this. It is clear from this formula that oe é CUP; I) G@=1,...,m), 
that is, f € COUT). Reasoning as in the proof of Proposition 1, we 


establish by induction that f €e C®)(I™;I1) when F € C®)(U;R). O 


worry 


Example 2. Assume that the function F : G — R is defined in a domain 
G C R” and belongs to the class C“)(G;R); xo = (2j,...,27") € G and 
F(xo) = F(a),..., 20") = 0. If xo is not a critical point of F, then at least 
one of the partial derivatives of F at x9 is nonzero. Suppose, for example, 
that 24 (xo) # 0. 

Then, by Proposition 2, in some neighborhood of xo the subset of R” 
defined by the equation F(z!,...,x27) = 0 can be defined as the graph of 


a function 2” = f(z',...,2™ 1), defined in a neighborhood of the point 
(xg,...,25’"') € R™} that is continuously differentiable in this neighbor- 
hood and such that f(x,...,29’"') = 27. 


Thus, in a neighborhood of a noncritical point xo of F the equation 
F(z',...,2) =0 
defines an (m — 1)-dimensional surface. 
In particular, in the case of R3 the equation 
F(z,y,z) =0 


defines a two-dimensional surface in a neighborhood of a noncritical 
point (£o, Yo, zo) satisfying the equation, which, when the condition 
oF (xo, yo, zo) Æ 0 holds, can be locally written in the form 


z= f(x,y). 
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As we know, the equation of the plane tangent to the graph of this function 
at the point (x0, yo, Zo) has the form 


zZ — žo = ÎL (x0, wo) — zo) + SL (zo, ¥o)(u — yo) - 


But by formula (8.89) 


F, (£0, Yo, 20) 


F’,(Zo, Yo, 20) 7 
F! (£0, Yo, 20) ` 


Of 
F! (20; Yo; zo)” Oy (0, y0) = 


Of 
———. x ) = — 
azl 0 yo) 

and therefore the equation of the tangent plane can be rewritten as 


F; (£0, Yo, 20)(% — Zo) + Fy (Xo, yo, 20)(y — yo) + F; (Xo, yo, 20)(z — 20) = 0, 


which is symmetric in the variables x,y, z. 
Similarly, in the general case we obtain the equation 


> Fa: (#o)(a* — z) = 0 
i=1 


of the hyperplane in R™ tangent at the point xo = (x4, ..., 27") to the surface 
given by the equation F(z!,...,2™) = 0 (naturally, under the assumptions 
that F'(xo) = 0 and that zo is a noncritical point of F). 

It can be seen from these equations that, given the Euclidean structure 
on R™, one can assert that the vector 


OF OF 
grad F (20) = Ca oa, a) (eo) 


is orthogonal to the r-level surface F(x) = r of the function F' at a corre- 
sponding point xo € R”. 
For example, for the function 


defined in R3, the r-level is the empty set if r < 0, a single point if r = 0, 
and the ellipsoid 


if r > 0. If (x0, yo, Zo) is a point on this ellipsoid, then by what has been 
proved, the vector 


279 2yo 220 
grad F(z, yo, 20) = (=F p2’ T) 


is orthogonal to this ellipsoid at the point (£o, yo, zo), and the tangent plane 
to it at this point has the equation 
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Lo(x—- 2x — zolz- z 
o( n o) , yoly— Yo) , zol : o0) o, 
a b c 
which, when we take account of the fact that the point (£o, Yo, Zo) lies on the 
ellipsoid, can be rewritten as 


8.5.4 The Implicit Function Theorem 


We now turn to the general case of a system of equations 


Faea Oe ad y= 0 


eer eee her ert E (8.90) 
BD pant oy tag”) = 0, 
which we shall solve with respect to y',...,y”, that is, find a system of 
functional relations 
Te gl Cam ka 
Sk eee EE EE ee (8.91) 


Ge ig (2 oi a 
locally equivalent to the system (8.90). 

For the sake of brevity, convenience in writing, and clarity of statement, 
let us agree that x = (z!,...,2™), y = (y',..., y”). We shall write the left- 
hand side of the system (8.90) as F(x,y), the system of equations (8.90) as 
F(x,y) = 0, and the mapping (8.91) as y = f(z). 

If 


y= (dat ls Yo = (Yd, -- -Y0 ) , 
. Ota ca) Cae 2850") 
the notation |£ — zo| < œ or |y — yo| < 8 will mean that |x* — xġ| < a’ 


(i =1,...,m) or |y? — y| < BÍ (j =1,...,n) respectively. 
We next set 


OF nee 
Ox} on™ 
f'(x) — ae E E (x) . (8.92) 
of” of” 
a . 
OF} Da OF} 
Ox} Ox™ 
F' (x,y) = E E (x,y) : (8.93) 
OF” OF” 
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OF} OF} 
Oy! Oy” 
F; (x, y) = rere era eee ee ean (x, y) e (8.94) 
op" OFT 
Oy! Oy” 


We remark that the matrix F; (x,y) is square and hence invertible if and 
only if its determinant is nonzero. In the case n = 1, it reduces to a single 
element, and in that case the invertibility of F(x,y) is equivalent to the 
condition that that single element is nonzero. As usual, we shall denote the 


matrix inverse to Fi (x,y) by [Fy(x, y)| k 
We now state the main result of the present section. 


Theorem 1. (Implicit function theorem). If the mapping F : U — R” de- 
fined in a neighborhood U of the point (x9, yo) E R™*” is such that 


1° F e C®)(U; R”), p> 1, 
2° F (x0, yo) = 0, 
3e y(Z0, Yo) is an invertible matriz, 
then there exists an (m + n)-dimensional interval I = I” x Ij CU, here 


= {x € R” | |z — zo| < a}, Iy = {y € R”| |y — yol < 8}, 
and a mapping f € CP) (I™; I) such that 
F(x,y) =0& y=f(x), (8.95) 


for any point (x,y) € Iy x I} and 
f' (2) = —[F! (x, f(x) |" [F. (2, f(2))] - (8.96) 


Proof. The proof of the theorem will rely on Proposition 2 and the elemen- 
tary properties of determinants. We shall break it into stages, reasoning by 
induction. 
For n = 1, the theorem is the same as Proposition 2 and is therefore true. 
Suppose the theorem is true for dimension n — 1. We shall show that it is 
then valid for dimension n. 


a) By hypothesis 3°, the determinant of the matrix (8.94) is nonzero at 
the point (zo, yo) € R™*” and hence in some neighborhood of the point 
(£o, yo). Consequently at least one element of the last row of this matrix is 
nonzero. Up to a change in the notation, we may assume that the element 


is nonzero. 


Oy” 
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b) Then applying Proposition 2 to the relation 
F”(x!,...,£™,y!,..., yY") =0, 


we find an interval [™+" = ([™ x aa x i C U and a function f € 
CP (I™ x Lz) such that 


Cu TN z™, y',...,y") =0 in pose) < 
=> OSI ee et a 
(x!,...,0™) CT Gr canal) € Æ). (8.97) 


c) Substituting the resulting expression y” = f(x,y!,...,y"~) for the 
variable y” in the first n — 1 equations of (8.90), we obtain n — 1 relations 


Ce ee | 


(8.98) 
It is clear that Pt € CW) (IP x 17) (i{=1,...,n—1), and 
DUG E Uamesady = 0 (i,...,n—1), 
since f(xg,..., 27", yd,..., y+) = yg and F*(z9, yo) =0 (i = 1,...,n). 
By definition of the functions 6* (k =1,...,n—1), 
ðk AF OF* Of 
ae J (i,k =1,...,n—1). (8.99) 


Oyt Oy? * By” Ay’ 
Further setting 


Pr (x!,..., £, Y1., YT!) i= 
= P anat aaa a e aa eae) ? 


we find by (8.97) that 6” = 0 in its domain of definition, and therefore 


0" OF” n OF" Of _ 
Oy? Oy® Oy” yt 


(i=1,...,n— 1). (8.100) 


Taking account of relations (8.99) and (8.100) and the properties of de- 
terminants, we can now observe that the determinant of the matrix (8.94) 
equals the determinant of the matrix 
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dy! va Ayn-1 Ayn 


1 =] 
Oy Oy” . is 
0 0 
Oy” 
n 
By assumption, By Æ 0, and the determinant of the matrix (8.94) is 
nonzero. Consequently, in some neighborhood of (z3,..., 27", y4, ... y3!) the 
determinant of the matrix 
Og! Og! 
Oy} Oyr-1 
Sd Wr ere er a) Baw ee (x’, cae va™ y}, ee a) 
OGr-1 Og"! 
dy! i ðyr-! 
is nonzero. 
Then by the induction hypothesis there exist an interval I fie mo EL 
In) C 1m x IP}, which is a neighborhood of (xò, ..., £0, Yò,--- Yo) in 


R™~1, and a mapping f € C(I"; I?—') such that the system (8.98) is 
equivalent on the interval [™+"—' = I x I?—* to the relations 


y = fi(a',...2™), 
heute octane i E (8.101) 
ee eae Cg Me a 


d) Since Ip} C i, and I” c I™, substituting f!,..., f”! from 
(8.101) in place of the corresponding variables in the function 


n 


yY =f(z',...,27 y) 0.54 
from (8.97) we obtain a relation | 
(a ea”) (8.102) 


between y” and (z’,...,2™). 


a) 
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e) We now show that the system 
Tae fa oer aL 
FEO EA E Rees gE: (8.103) 
yS fill ana S 


which defines a mapping f € C(I; I”), where I? = I?! xI}, is equivalent 
to the system of equations (8.90) in the neighborhood [™*" = I™ x Li: 

In fact, inside 7+" = (IP x iega x I} we began by replacing 
the last equation of the original system (8.90) with the equality y” = 
f(x,y!,...,y”71), which is equivalent to it by virtue of (8.97). From the 
second system so obtained, we passed to a third system equivalent to it by 
replacing the variable y” in the first n — 1 equations with f(z, y',...,y"—'). 
We then replaced the first n — 1 equations (8.98) of the third system inside 
rr x ‘em cay ie with relations (8.101), which are equivalent to them. 
In that way, we obtained a fourth system, after which we passed to the final 
system (8.103), which is equivalent to it inside If’ x I?~* x I} = I™*”, by 
replacing the variables y’,...,y”~' with their expressions (8.101) in the last 
equation y” = f(z',...,2™,y',...,y”") of the fourth system, obtaining 
(8.102) as the last equation. 

f) To complete the proof of the theorem it remains only to verify formula 
(8.96). 

Since the systems (8.90) and (8.91) are equivalent in the neighborhood 
Iz’ x If of the point (Zo, yo), it follows that 


F(x, f(x)) =0, ifr eI. 
In coordinates this means that in the domain 17 


UG nsa a E aa rn i raaraa E 
(k =1,...,n) . (8.104) 


Since f € C) (1m; I”) and F € CP)(U; R”), where p > 1, it follows that 
F(-, f(-)) € C?(I™; R”) and, differentiating the identity (8.104), we obtain 


OFF SOF Of | 
Oxi + Le ay Bat = (RE laam t = lenm) (8.105) 


= Relations (8.105) are obviously equivalent to the single matrix equality 
F(t u) + Fy(z,y)- f(z) =0, 


in which y = f(x). 
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Taking account of the invertibility of the matrix F (x,y) in a neighbor- 
hood of the point (x9, yo), we find by this equality that 


f'(c) = -[F! (x, f(x)) |" [Fi(2, f(2))] , 


and the theorem is completely proved. O 


8.5.5 Problems and Exercises 


1. On the plane R? with coordinates x and y a curve is defined by the relation 
F(x,y) = 0, where F € C‘?)(R?,R). Let (zo, yo) be a noncritical point of the 
function F(x,y) lying on the curve. 


a) Write the equation of the tangent to this curve at this point (xo, yo). 


b) Show that if (xo, yo) is a point of inflection of the curve, then the following 
equality holds: 


[eer — OF" FLF! + PE) (x0, yo) = 0. 


c) Find a formula for the curvature of the curve at the point (xo, yo). 


2. The Legendre transform in m variables. The Legendre transform of x',...,2” 
and the function f(z’,..., 2”) is the transformation to the new variables £1,...,€m 
and function f*(£1,...,&m) defined by the relations 
Se rae xe) GS leam); 
(8.106) 


Fi (ee ee Be ~ f(z,...,2). 


a) Give a geometric interpretation of the Legendre transform (8.106) as the tran- 
sition from the coordinates (x*,...,2, f(xz’,...,2™)) of a point on the graph of 
the function f(z) to the parameters (£1,...,&m, f*(€1,...,&m)) defining the equa- 
tion of the plane tangent to the graph at that point. 


b) Show that the Legendre transform is guaranteed to be possible locally if 
f €C™ and det (55h) Æ 0. 
c) Using the same definition of convexity for a function f(x) = f(z’,..., 2) 


as in the one-dimensional case (taking x to be the vector (x’,...,2™) € R™), show 
that the Legendre transform of a convex function is a convex function. 


d) Show that 


df* = X adé; + yad -df = 5 ii i 
i=1 i=1 i=l 


and deduce from this relation that the Legendre transform is involutive, that is, 
verify the equality 
(F) (z) = f(z). 
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e) Taking account of d), write the transform (8.106) in the following form, which 
is symmetric in the variables: 


Jo ORT Em) Af E a T) = S gia ) 

(8.107) 
Of 
Bai | 


i = Dsg Ta a! = g (Enim) 


or, more briefly, in the form 


F(E) + fl) = ér, E=Vf(z), s=Vf"(€), 


where 
v= (2o vreo- (Eo, 
a= x’ = y ar l 
i=1 


f) The matrix formed from the second-order partial derivatives of a function 
(and sometimes the determinant of this matrix) is called the Hessian of the function 
at a given point. 

Let di; and d;; be the co-factors of the elements Eaa and ie of the 


OxtOxI 
Hessians 
O° f Of 0? f* 0? f* 
Ox 0x1 Ox10x™ 061081 061 0Em 
EA TEET (x) , spuercmesnneedecd ese, |e) 
ict een ae ee DI a ie 
Ox™ Ox} Ox™Ox™ OEmOE1 OEmOEm 


of the functions f(x) and f*(€), and let d and d* be the determinants of these 
matrices. Assuming that d Æ 0, show that d-d* = 1 and that 


0 f d 
aron =g 


se = ZO). 


g) A soap film spanning a wire frame forms a so-called minimal surface, having 
minimal area among all the surfaces spanning the contour. 

If that surface is locally defined as the graph of a function z = f(x,y), it turns 
out that the function f must satisfy the following equation for minimal surfaces: 


(1+ fy) fee - 2fefyfey + (1+ fe) fry =9- 


-= Show that after a Legendre transform is performed this equation is brought into 
the form 


(LENS +2&nfén’ + (1+?) fee” = 0 
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3. Canonical variables and the Hamilton equations.® 


a) In the calculus of variations and the fundamental principles of classical me- 
chanics the following system of equations, due to Euler and Lagrange, plays an 
important role: 


OL d x) 
— — —— ]j(t,z,v)=0, 
v= a(t), 


where L(t, x, v) is a given function of the variables t, x, v, of which t is usually time, 
x the coordinate, and v the velocity. 

The system (8.108) consists of two relations in three variables. Usually we wish 
to determine x = x(t) and v = v(t) from (8.108), which essentially reduces to 
determining the relation x = x(t), since v = $2. 

Write the first equation of (8.108) in more detail, expanding the derivative a 


taking account of the equalities x = x(t) and v = v(t). 


b) Show that if we change from the coordinates t, x, v, L to the so-called canon- 
ical coordinates t,x, p, H by performing the Legendre transform (see Problem 2) 


_ ab 
p= a 
H = pv- L 


with respect to the variables v and L to replace them with p and H, then the 
Euler-Lagrange system (8.108) assumes the symmetric form 


OH OH | 
j) = —-—, t = —, 8.109 
p Ox ae Op ( ) 
in which it is called system of Hamilton equations. 
c) In the multidimensional case, when L = L(t,z’,...,2™,v',...,v™) the 
Euler-Lagrange system has the form 
(2 -$E 9) =0, 
7 j (8.110) 
vt = t (t) CSi M) 
where for brevity we have set x = (z’,... ,2™), v=(v',...,v™). 
By performing a Legendre transform with respect to the variables v’,...,v™, L, 
change from the variables t,x’,...,2,v',...,u™,L to the canonical variables 


t,x',...,2'",p1,...,Pm,H and show that in these variables the system (8.110) 
becomes the following system of Hamilton equations: 


. H i OH a 
Bi = 3G = a (= lexus) (8.111) 


8 W. R. Hamilton (1805-1865) — famous Irish mathematician and specialist in me- 
chanics. He stated a variational principle (Hamilton’s principle), constructed a 
phenomenological theory of optic phenomena, and was the creator of quaternions 
and the founder of vector analysis (in fact, the term “vector” is due to him). 
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4. The implicit function theorem. 

The solution of this problem gives another proof of the fundamental theorem of 
this section, perhaps less intuitive and constructive than the one given above, but 
shorter. 

a) Suppose the hypotheses of the implicit function theorem are satisfied, and 
let 


i OF’ OF’ 
Fle.) = (Sa Som) (o0) 


be the ith row of the matrix F; (x,y). 

Show that the determinant of the matrix formed from the vectors Fy(2:, yi) 
is nonzero if all the points (x: y:i) (i = 1,...,n) lie in some sufficiently small 
neighborhood U = I,” x Iy of (0, yo). | 


b) Show that, if for x € I,” there are points y1, y2 € Jy such that F(z, yi) = 0 
and F(x, y2) = 0, then for each i € {1,...,n} there is a point (x, yi) lying on the 
closed interval with endpoints (z, yi) and (z, y2) such that 


Fi (2x, yi)(ye — yi) = 0 G=) 


Show that this implies that yı = ye, that is, if the implicit function f : Ix —> Iy 
exists, it is unique. 

c) Show that if the open ball B(yo;r) is contained in I’, then F(xo, y) 4 0 for 
ly — yollre =r > 0. 

d) The function ||F (xo, y)||2n is continuous and has a positive minimum value 
u on the sphere ||y — yo||rn = r. 

e) There exists 6 > 0 such that for ||x — zo||pm < 6 we have 

2 1 : 
IE y) 2 5H, if lly- volre =r, 
2 1 j 
IFz y) < 5#» ify 5yo. 


f) For any fixed x such that ||xz — zo|| < ô the function || F(x, y)l|fn attains a 
minimum at some interior point y = f(x) of the open ball ||y — yo||rpn < r, and 


since the matrix F% (2, f (z)) is invertible, it follows that F (z, f (z)) = 0. This 
establishes the existence of the implicit function f : B(x0;6) —> B(yo;r). 
g) If Ay = f(x + Az) — f(x), then 


Ay = -|F,| B |F] Ax , 


where F is the matrix whose rows are the vectors Fi (xi, yi), (i = 1,...,n), (£i, ys) 
being a point on the closed interval with endpoints (x,y) and (x + Ax, y + Ay). 
The symbol F”, has a similar meaning. 

Show that this relation implies that the function y = f(x) is continuous. 


h) Show that 


f'(x) = -|F (s) : iz (x, F(2))] 


498 8 Differential Calculus in Several Variables 


« Oz O ð ” 
5. “If f(x,y,z) = 0, then 5% - 52-3, TTL 
a) Give a precise meaning to this statement. 


b) Verify that it holds in the example of Clapeyron’s ideal gas equation 
P.V 


—=— = const 
T 
and in the general case of a function of three variables. 
c) Write the analogous statement for the relation f(x!,...,£™) = 0 among m 
variables. Verify that it is correct. 
6. Show that the roots of the equation 
z” + ez + eHe =0 


are smooth functions of the coefficients, at least when they are all distinct. 


8.6 Some Corollaries of the Implicit Function Theorem 


8.6.1 The Inverse Function Theorem 


Definition 1. A mapping f : U — V, where U ‘and V are open sub- 
sets of R”, is a C)-diffeomorphism or a diffeomorphism of smoothness p 
(p = 0,1,...), if | 

1) f € CP) (U; V); 

2) f is a bijection; 

3) fT! e CP)(V; U). 

A C\)-diffeomorphism is called a homeomorphism. 


As a rule, in this book we shall consider only the smooth case, that is, 
the case p € N or p = oo. 

The basic idea of the following frequently used theorem is that if the 
differential of a mapping is invertible at a point, then the mapping itself is 
invertible in some neighborhood of the point. 


Theorem 1. (Inverse function theorem). If a mapping f : G —> R” ofa 
domain G C R™ is such that 


1° f € CP (G; R”), p21, 
2° yo = f (x0) at xo E€ G, 
3° f'(xo) is invertible, 
then there exists a neighborhood U (xo) C G of £o and a neighborhood V (yo) 


of yo such that f : U(xo) + V (yo) is a C)-diffeomorphism. Moreover, if 
x E€ U (xo) and y = f(x) € V (yo), then 


(f-*)'(y) = (Fa). 
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Proof. We rewrite the relation y = f(z) in the form 
F(z,y) = f(e) -y=0. (8.112) 


The function F(x,y) = f(x) — y is defined for x € G and y € R”, that is 
it is defined in the neighborhood G x R™ of the point (xo, yo) E R™ x R”. 

We wish to solve Eq. (8.112) with respect to x in some neighborhood of 
(x9, Yo). By hypotheses 1°, 2°, 3° of the theorem the mapping F(x,y) has 
the property that 


FeCc®’)(GxR™;R™), p21, 
F(x0, yo) = 0 ) 
F' (x0, yo) = f' (zo) is invertible. 


By the implicit function theorem there exist a neighborhood I, x I, of 
(£o, yo) and a mapping g € C")(I,; Iz) such that 


f(z)- y =0 & z = g(y) (8.113) 


for any point (x,y) € Is x Iy and 


g'(y) = -[F;(2,y)] [Fi (2,y)] - 
In the present case 
Fi(z,y)=f'(z), F,(z,y)=—E, 


where F is the identity matrix; therefore 


g'ly) = (f'(a)) (8.114) 


If we set V = I, and U = g(V), relation (8.113) shows that the mappings 
f:U >V and g: V —= U are mutually inverse, that is, g = f~' on V. 

Since V = I,, it follows that V is a neighborhood of yo. This means that 
under hypotheses 1°, 2°, and 3° the image yo = f (xo) of £o € G, which is an 
interior point of G, is an interior point of the image f(G) of G. By formula 
(8.114) the matrix g’(yo) is invertible. Therefore the mapping g : V — U has 
properties 1°, 2°, and 3° relative to the domain V and the point yo € V. 
Hence by what has already been proved xp = g(yo) is an interior point of 
U = g(V). 

Since by (8.114) hypotheses 1°, 2°, and 3° obviously hold at any point 
y E€ V, any point x = g(y) is an interior point of U. Thus U is an open (and 
obviously even connected) neighborhood of zo € R”. 

We have now verified that the mapping f : U — V satisfies all the condi- 
tions of Definition 1 and the assertion of Theorem 1. O 


We shall now give several examples that illustrate Theorem 1. 
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The inverse function theorem is very often used in converting from one 
coordinate system to another. The simplest version of such a change of co- 
ordinates was studied in analytic geometry and linear algebra and has the 


form i 


1 1 
Y ai e Am T 


m m 
y” ai eee Am ay itt 


or, in compact notation, y = afz’. This linear transformation A: R™ > Ry 
has an inverse A`! : Ry — Ry defined on the entire space Rọ’ if and only if 
the matrix (a?) is invertible, that is, det(a?) 4 0. 

The inverse function theorem is a local version of this proposition, based 
on the fact that in a neighborhood of a point a smooth mapping behaves 
approximately like its differential at the point. 


Example 1. Polar coordinates. The mapping f : R? — R? of the half-plane 
RŽ = {(p, p) € R?| p > 0} onto the plane R? defined by the formula 


x= pcosy, 


l (8.115) 
y = psing, 


is illustrated in Fig. 8.5 

The Jacobian of this mapping, as can be easily computed, is p, that is, 
it is nonzero in a neighborhood of any point (p,p), where p > 0. Therefore 
formulas (8.115) are locally invertible and hence locally the numbers p and 
y can be taken as new coordinates of the point previously determined by the 
Cartesian coordinates x and y. 

The coordinates (p, y) are a well known system of curvilinear coordinates 
on the plane — polar coordinates. Their geometric interpretation is shown in 
Fig. 8.5. We note that by the periodicity of the functions cosy and sin ọ the 
mapping (8.115) is only locally a diffeomorphism when p > 0; it is not bijec- 
tive on the entire plane. That is the reason that the change from Cartesian 
to polar coordinates always involves a choice of a branch of the argument y 
(that is, an indication of its range of variation). 


NIA + 
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Polar coordinates (p, 9, p) in three-dimensional space R3 are called spher- 
ical coordinates. They are connected with Cartesian coordinates by the for- 


mulas 
z = pcosy, 


y = psinysiny , (8.116) 
x = psinycosy. 


The geometric meaning of the parameters p, Y, and y is shown in Fig. 8.6. 


The Jacobian of the mapping (8.116) is p? sin Y, and so by Theorem 1 
the mapping is invertible in a neighborhood of each point (p, Y, p) at which 
p > 0 and siny Æ 0. 

The sets where p = const, y = const, or ù = const in (x,y, z)-space 
obviously correspond to a spherical surface (a sphere of radius p), a half- 
plane passing through the z-axis, and the surface of a cone whose axis is the 
z-axis respectively. 

Thus in passing from coordinates (x,y,z) to coordinates (p, %, p), for 
example, the spherical surface and the conical surface are flattened; they 
correspond to pieces of the planes p = const and w = const respectively. We 
- observed a similar phenomenon in the two-dimensional case, where an arc of 
a circle in the (x, y)-plane corresponded to a closed interval on the line in the 
plane with coordinates (p, y) (see Fig. 8.5). Please note that this is a local 
straightening. 

In the m-dimensional case polar coordinates are introduced by the rela- 
tions 

“i =P cos 71 ; 
x“ = psin 91 COS Ye , 


z™-! = psin y1 Sin Y2- SiN Ym—2 COS Ym-1 , 
x” = psin 91 Sin Y2 -sin Ym-—2 SİN Ym-1 - 
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The Jacobian of this transformation is 


m—1 —:..m-—2 m—3 


p sin pı sin P2 -SİN Ym-2 , (8.118) 


and by Theorem 1 it is also locally invertible everywhere where this Jacobian 
is nonzero. 


Example 2. The general idea of local rectification of curves. New coordinates 
are usually introduced for the purpose of simplifying the analytic expression 
for the objects that occur in a problem and making them easier to visualize 
in the new notation. 

Suppose for example, a curve in the plane R? is defined by the equation 


F(z,y) =0. 


Assume that F is a smooth function, that the point (xo, yo) lies on the curve, 
that is, F(2o, yo) = 0, and that this point is not a critical point of F. For 
example, suppose F(x,y) # 0. 

Let us try to choose coordinates £, n so that in these coordinates a closed 
interval of a coordinate line, for example, the line 7 = 0, corresponds to an 
arc of this curve. 

We set 

€=2-2o, n = F(x,y). 


(E m) oa) 


of this transformation has as its determinant the number F(x,y), which 
by assumption is nonzero at (xo, yo). Then by Theorem 1, this mapping is 
a diffeomorphism of a neighborhood of (£o, yo) onto a neighborhood of the 
point (€,7) = (0,0). Hence, inside this neighborhood, the numbers € and 7 
can be taken as new coordinates of points lying in a neighborhood of (Zo, yo). 
In the new coordinates, the curve obviously has the equation 7 = 0, and in 
this sense we have indeed achieved a local rectification of it (see Fig. 8.7). 


The Jacobi matrix 


F(z,y) = 0 
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8.6.2 Local Reduction of a Smooth Mapping to Canonical Form 


In this subsection we shall consider only one question of this type. To be 
specific, we shall exhibit a canonical form to which one can locally reduce 
any smooth mapping of constant rank by means of a suitable choice of coor- 
dinates. 

We recall that the rank of a mapping f : U > R” of a domain U C R” 
at a point x € U is the rank of the linear transformation tangent to it at the 
point, that is, the rank of the matrix f'(x). The rank of a mapping at a point 
is usually denoted rank f(z). 


Theorem 2. (The rank theorem). Let f : U —> R” be a mapping defined in 
a neighborhood U C R™ of a point xo € R™. If f c C™(U;R”), p > 1, and 
the mapping f has the same rank k at every point x € U, then there exist 
neighborhoods O(xo) of to and O(yo) of yo = f(xo) and diffeomorphisms 
u = p(x), v = p(y) of those neighborhoods, of class CP), such that the 
mapping v = yo f op '(u) has the coordinate representation 


(ul,...,u*,...,u”) =unve=(v',...,v") =(u',...,u*,0,...,0) (8.119) 
in the neighborhood O(uo) = y(O(x0)) of uo = (x0). 


In other words, the theorem asserts (see Fig. 8.8) that one can choose 
coordinates (u!,...,u’) in place of (z!,...,2™) and (v!,...,v”) in place of 
(y',...,y”) in such a way that locally the mapping has the form (8.119) in 
the new coordinates, that is, the canonical form for a linear transformation 
of rank k. 
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Proof. We write the coordinate representation 


eae a Cage ke 


(8.120) 


io = f(D cage) 


of the mapping f : U — Rj, which is defined in a neighborhood of the point 
xo E€ R”. In order to avoid relabeling the coordinates and the neighborhood 
U, we shall assume that at every point x € U, the principal minor of order k 
in the upper left corner of the matrix f’(x) is nonzero. 

Let us consider the mapping defined in a neighborhood U of xo by the 
equalities 


u! = oi. sa”) = pes ja”) ? 

k kfl m K (1 m 

C=O" (a nee) a eae de) 
urtL = ghH(gl gm) = ght) (8.121) 
TUE oda rg era la ha 


Orl Oak ` Ort O Ogi 


Ox} Oxk — ðxk+! Ox™ | , 
1 0 
0 
0 1 


and by assumption its determinant is nonzero in U. 

By the inverse function theorem, the mapping u = (x) is a diffeomor- 
phism of smoothness p of some neighborhood O(xo) C U of xo onto a neigh- 
borhood O(uo) = y(O(z0)) of uo = (z0). 
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Comparing relations (8.120) and (8.121), we see that the composite func- 
tion g = foy !:O(uo) > Ry has the coordinate representation 


y! = flogi (ul... u™) = u! , 
k frog Gi at. = ur (8.122) 
yera = fog i (ul,...,.u™) g T (uat s 


e o | 


SS frog OD ial): = g Un x 


Since the mapping y~! : O(uo) + (zo) has maximal rank m at each 


point u € O(uo), and the mapping f : O(xo) — Rj has rank k at every 
point x € O(z20), it follows, as is known from linear algebra, that the matrix 
g'(u) = f’(y~1(u)) (p+) (u) has rank k at every point u € O(uo). 

Direct computation of the Jacobi matrix of the mapping (8.122) yields 


e è o o oo ooo oeoo oooooo o 0o ooo ooo oo’ oo oH oH ‘o o oH ooo 


ðu! Ouk ` ðuk+! Ou™ 
ao o o; ag og 
ðu! Ouk — ðuk+! ðu” 


2 dai 
Hence at each point u € O(uo) we obtain Salu) = 0 for i = k+ 1,..., m; 


j=k+1,...,n. Assuming that the neighborhood O(ug) is convex (which 
can be achieved by shrinking O(uo) to a ball with center at uo, for example), 
we can conclude from this that the functions gf, j = k + 1,...,n, really are 
independent of the variables u*+1,...,u™. 


After this decisive observation, we can rewrite the mapping (8.122) as 
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y =u, 
k i 2 Zs yea k (8.123) 
aia ad © u"), 


e © e èo oo ooo eee eee ee ooo ooo ooo oo oo 


y” =g" (i ies"). 


da y* =: Wy) , (8.124) 


(E e e r r r a a a a a a T a a a a a a e T e r a e e e e a e a e E a E r a T a E r 


v” = y” — g” (yt, ..., y) =: Y” (y) . 


It is clear from the construction of the functions g’ (j = k+1,...,n) that 
the mapping w is defined in a neighborhood of yo and belongs to class C®) 
in that neighborhood. 

The Jacobi matrix of the mapping (8.124) has the form 


1 0 
0 
0 1 
_ gt? gtt! 
Oy} Oy* 
Og” Og” . 
at oe | 1 
Oy! Oy* 


Its determinant equals 1, and so by Theorem 1 the mapping yw is a dif- 
feomorphism of smoothness p of some neighborhood O(yo) of yo E Ry onto 
a neighborhood O(vo) = w(O(yo)) of vo E€ RẸ. 

Comparing relations (8.123) and (8.124), we see that in a neighborhood 
O(uo) C O(uo) of uo so small that g(O(uo)) C O(yo), the mapping Pofoy7! : 
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O(uo) > Ry is a mapping of smoothness p from this neighborhood onto some 
neighborhood O(vo) C O(vo) of vo € R? and that it has the canonical form 


v= ul, 
k k 
(ie) ae 
yetl — 0, (8.125) 
py = 0 


Setting y~1(O(uo)) = O(xo) and ~~ *(O(vo)) = O(yo), we obtain the 
neighborhoods of xg and yo whose existence is asserted in the theorem. The 
proof is now complete. O 


Theorem 2, like Theorem 1, is obviously a local version of the correspond- 
ing theorem from linear algebra. 

In connection with the proof just given of Theorem 2, we make the fol- 
lowing remarks, which will be useful in what follows. 


Remark 1. If the rank of the mapping f : U — R” is n at every point of the 
original neighborhood U C R”, then the point yo = f (xo), where xp € U, is 
an interior point of f(U), that is, f(U) contains a neighborhood of this point. 


Proof. Indeed, from what was just proved, the mapping wo foy! : O(uo) > 
O(vo) has the form 


Gt a Saw (ee te 


in this case, and so the image of a neighborhood of up = (zo) contains some 
neighborhood of vo = Y o f o y+ (uo). 

But the mappings y : O(x%0) > O(uo) and w : O(yo) + O(vo) are diffeo- 
morphisms, and therefore they map interior points to interior points. Writing 
the original mapping f as f = ~~1o (wo fog") oy, we conclude that 
yo = f (xo) is an interior point of the image of a neighborhood of zp. O 


Remark 2. If the rank of the mapping f : U — R” is k at every point of a 
neighborhood U and k < n, then, by Eqs. (8.120), (8.124), and (8.125), in 
some neighborhood of x9 E€ U C R” the following n — k relations hold: 


PG weg d S O ea J" (Ersa) 
(i=k+1,...,n). (8.126) 


These relations are written under the assumption we have made that the 
principal minor of order k of the matrix f’(zo) is nonzero, that is, the rank k 
is realized on the set of functions f!,..., fF. Otherwise one may relabel the 
functions f',..., f and again have this situation. 
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8.6.3 Functional Dependence 


Definition 2. A system of continuous functions f*(x) = f*(z',...,2™) 
(i = 1,...,n) is functionally independent in a neighborhood of a point 
ro = (x,..., 2%) if for any continuous function F(y) = F(y',...,y”) de- 


fined in a neighborhood of yo = (y§,---, y2) = (f (z0), .-., f” (£0)) = f (xo), 
the relation 
P weg ried i ana 0 


is possible at all points of a neighborhood of xo only when F(y',...,y”) =0 
in a neighborhood of yo. 


The linear independence studied in algebra is independence with respect 
to linear relations 


F(y',...,y") = Ary! +--+ Any” . 


If a system is not functionally independent, it is said to be functionally 
dependent. | 

When vectors are linearly dependent, one of them obviously is a linear 
combination of the others. A similar situation holds in the relation of func- 
tional dependence of a system of smooth functions. 


Proposition 1. If a system f*(z1,...,2™) (i = 1,...,n) of smooth func- 
tions defined on a neighborhood U (xo) of the point xo E R™ is such that the 
rank of the matrix 


oft Of 
ôx! Ox™ 
EETA PA tant (x) 
of of” 
Ox} Ox™ 


is equal to the same number k at every point x E€ U, then 

a) when k = n, the system is functionally independent in a neighborhood 
of LO; 

b) when k < n, there exist a neighborhood of xo and k functions of the 
system, say f',..., fF such that the other n—k functions can be represented 
as 


f'(z',...,27) = OU art sata @ pew e”)) 


in this neighborhood, where g*(y!,...,y”), (i =k+1,...,n) are smooth func- 
tions defined in a neighborhood of yo = (f*(2o),---; f"(xo)) and depending 
only on k coordinates of the variable point y = (y',...,y”). 


Proof. In fact, if k = n, then by Remark 1 after the rank theorem, the image 
of a neighborhood of the point x9 under the mapping 
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yt = bie Care a 


Sasi dear daha ETA (8.127) 


contains a neighborhood of yo = f(zo). But then the relation 
P eee hue] E rsa) =0 
can hold in a neighborhood of zo only if 
F(y!,..., y") =0 


in a neighborhood of yo. This proves assertion a). 


If k < n and the rank k of the mapping (8.127) is realized on the func- 
tions f!,..., fF, then by Remark 2 after the rank theorem, there exists a 
neighborhood of yo = f(zo) and n — k functions g*(y) = g*(y',...,y*) 
(i = k+1,...,n), defined on that neighborhood, having the same order 
of smoothness as the functions of the original system, and such that relations 
(8.126) hold in some neighborhood of xo. This proves b). O 


_ We have now shown that if k < n there exist n — k special functions 
F(y)=y'—g*(y',...,y") (i =k+1,...,n) that establish the relations 


F*(f'(a),...,f*(2), f*(z)) =0 G@=k+1,...,n) 
between the functions of the system f!,..., f*,..., f” in a neighborhood of 


the point Zo. 


8.6.4 Local Resolution of a Diffeomorphism 
into a Composition of Elementary Ones 


In this subsection we shall show how, using the inverse function theorem, one 
can represent a diffeomorphic mapping locally as a composition of diffeomor- 
phisms, each of which changes only one coordinate. 


Definition 3. A diffeomorphism g : U — R™ of an open set U c R™ will 
be called elementary if its coordinate representation is 


y = ri, i € {1,... mMm}, LAJA 


yi = gi(x',..., 2), 


that is, under the diffeomorphism g : U + R™ only one coordinate of the 
point being mapped is changed. 
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Proposition 2. If f : G — R” is a diffeomorphism of an open set G C R”, 
then for any point xo E€ G there is a neighborhood of the point in which 
the representation f = gı °---° gn holds, where gi,...,9n are elementary 
diffeomorphisms. 


Proof. We shall verify this by induction. 

If the original mapping f is itself elementary, the proposition holds triv- 
ially for it. 

Assume that the proposition holds for diffeomorphisms that alter at most 
(k—1) coordinates, where k—1 < n. Now consider a diffeomorphism f : G > 
R™ that alters k coordinates: 


e ee) 


y* = TG once) ? 
yktl = kth (8.128) 
y™ — em 


We have assumed that it is the first k coordinates that are changed, which 
can be achieved by linear changes of variable. Hence this assumption causes 
no loss in generality. 

Since f is a diffeomorphism, its Jacobi matrix f’(x) is nondegenerate at 
each point, for 


FACED VO 


Let us fix x9 € G and compute the determinant of f'(xo): 


Of? of? . ðf! oft 


Ox} Oxk ` Axk+t —— Ogm 
FORT ee eT ee re af} 7 af! 
afk afk ; of of® Ox! Oxk 
Or! Oak ` Axktl xr | (x9) =| ere (to) £0. 
Pee eee eee ee ee af* OFF 
1 0 ðr! Oak 
0 
0 1 


Thus one of the minors of order k — 1 of this last determinant must be 
nonzero. Again, for simplicity of notation, we shall assume that the principal 
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minor of order k — 1 is nonzero. Now consider the auxiliary mapping g : G > 
R” defined by the equalities 


ul = f'(x3,...,2™), 
ge eet RAT aN 
pia A ( ) (8.129) 
TuE 
Since the Jacobian 
OE aa E OE, OEe 
Ox} Oxk-1 ` Oak Oxz™ 
cut TEAN E E E EEE E ari 7 ve 
afr- afr- afr- afr-! Or} Oxk-1 
ðr! xk’ Axk —— Oax™ |(zo) =| erect eee (x0) 40 
TTE ces r 16 E ofr} ofr} 
1 0 ðr! xk! 
0 
0 1 


of the mapping g : G — R” is nonzero at xo € G, the mapping g is a 
diffeomorphism in some neighborhood of Zo. 

Then, in some neighborhood of uo = g(xzo) the mapping inverse to g, 
x = g ‘(u), is defined, making it possible to introduce new coordinates 
(u',...,u™) in a neighborhood of zo. 

Let h = f og !. In other words, the mapping y = h(u) is the mapping 
(8.128) y = f(x) written in u-coordinates. The mapping h, being the compo- 
sition of diffeomorphisms, is a diffeomorphism of some neighborhood of wo. 
Its coordinate expression obviously has the form 


y = frog *(u)=u', 


eeoeo34e3nse3eoeeeoe3eoeeeeeeeeeeeee ee ee we Ow 


that is, h is an elementary diffeomorphism. 
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But f = hog, and by the induction hypothesis the mapping g defined by 
(8.129) can be resolved into a composition of elementary diffeomorphisms. 
Thus, the diffeomorphism f, which alters k coordinates, can also be resolved 
into a composition of elementary diffeomorphisms in a neighborhood of Zo, 
which completes the induction. O 


8.6.5 Morse’s Lemma 


This same circle of ideas contains an intrinsically beautiful lemma of Morse? 
on the local reduction of smooth real-valued functions to canonical form in a 
neighborhood of a nondegenerate critical point. This lemma is also important 
in applications. 


Definition 4. Let xo be a critical point of the function f € C)(U;R) 
defined in a neighborhood U of this point. 
The critical point xo is a nondegenerate critical point of f if the Hessian 
2 


of the function at that point (that is, the matrix Sy (20) formed from 


the second-order partial derivatives) has a nonzero determinant. 


If xo is a critical point of the function, that is, f’(xo) = 0, then by Taylor’s 
formula 


f(z) — f(20) = 3 2 grag E) — %)(2? — 29) +o(le— xoll?) . (8.130) 


Morse’s lemma asserts that one can make a local change of coordinates x = 
g(y) such that the function will have the form 


(fo g)(y) — F(zo) = -4 —--- — (y*)? + T H H a 


when expressed in y-coordinates. 

If the remainder term o(||x — xo||?) were not present on the right-hand 
side of Eq. (8.130), that is, the difference f(x)— f (£o) were a simple quadratic 
form, then, as is known from algebra, it. could be brought into the indicated 
canonical form by a linear transformation. Thus the assertion we are about 
to prove is a local version of the theorem on reduction of a quadratic form 
to canonical form. The proof will use the idea of the proof of this algebraic 
theorem. We shall also rely on the inverse function theorem and the following 
proposition. 


Hadamard’s lemma. Let f : U > R be a function of class C®)(U; R), 
p > 1, defined in a conver neighborhood U of the point 0 = (0,...,0) € 


° H. C. M. Morse (1892-1977) — American mathematician; his main work was de- 
voted to the application of topological methods in various areas of analysis. 
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R™ and such that f(0) = 0. Then there exist functions gi € CP) (U; R) 
(i =1,...,m) such that the equality 


(Got ED ae et (8.131) 
i=! 


holds in U, and g;(0) = ef, (0). 


Proof. Equality (8.131) is essentially another useful expression for Taylor’s 
formula with the integral form of the remainder term. It follows from the 
equalities 


f(x 


Y p 
8 
= 
Il 
oF 
~ 
ren 
oh 
8 
= 
= 
8 
Ner 
or 
œ 
| 
8. 
© ee = 
Q 
~ 
+ 
8 
S p 
ce 
8 
2 
Q 
œ 


if we set 


1 

of , 

1 m 1 m 

4 be = e ) ae | = nears . 

gi(x a) | sn (tax tx™)dt (i m) 
0 


The fact that g;(0) = 2E (0) (i = 1,...,m) is obvious, and it is also not 
difficult to verify that g; € CP) (U; R). However, we shall not undertake the 
verification just now, since we shall later give a general rule for differentiating 
an integral depending on a parameter, from which the property we need for 
the functions g; will follow immediately. 


Thus, up to this verification, Hadamard’s formula (8.131) is proved. O 


Morse’s lemma. If f : G — R is a function of class C°)(G;R) defined on 
an open set G C R™ and Xo E€ G is a nondegenerate critical point of that 
function, then there exists a diffeomorphism g : V — U of some neighborhood 
of the origin 0 in R™ onto a neighborhood U of xo such that 


(F 0 9)(y) = f(z0) — [(y*)? +--+» + (y*)?] + TY +--+)? ] 
for allye V. 


Proof. By linear changes of variable we can reduce the problem to the case 
when zo = 0 and f(x) = 0, and from now on we shall assume that these 
conditions hold. 

Since Zo = 0 is a critical point of f, we have g;(0) = 0 in formula (8.131) 
(i=1,...,m). Then, also by Hadamard’s lemma, 


m 
Ge ect = X thi (c', Siete To 
j=l 
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where hij are smooth functions in a neighborhood of 0 and consequently 


FE ee) = ` gzihi (£t, ...,2™) : (8.132) 


4,j=1 


By making the substitution hj; = 4(hij +h,i) if necessary, we can assume 
that hi; = hji. We remark also that, by the uniqueness of the Taylor expan- 
82 
sion, the continuity of the functions h;; implies that h;;(0) = aroa i 7 (0) and 
T OL 
hence the matrix (hij (0)) is nondegenerate. 
The function f has now been written in a manner that resembles a 
quadratic form, and we wish, so to speak, to reduce it to diagonal form. 
As in the classical case, we proceed by induction. 
Assume that there exist coordinates ut, ...,u™ in a neighborhood U; of 
0 € R”, that is, a diffeomorphism x = y(u), such that 


(f oy)(u) = (ut)? £--- + (ut)? + 2. utut H;;(ut,...,u™) (8.133) 


t,j=r 


in the coordinates ut, ...,u™, where r > 1 and H; = H;i. 
We observe that relation (8.133) holds for r = 1, as one can see from 
(8.132), where Hij = ligja 


m . . 
By the hypothesis of the lemma the quadratic form $, xz*2Ih,;(0) is 
i,j=1 
nondegenerate, that is, det (h;;(0)) # 0. The change of variable z = y(u) is 
carried out by a diffeomorphism, so that det y’(0) 4 0. But then the matrix 
of the quadratic form +(u')?+---+(u"!)?+ >> u*u’H;;(0) obtained from 
t, JET 
the matrix (h;j(0)) through right-multiplication by the matrix y’(0) and left- 
multiplication by the transpose of y’(0) is also haan heat Consequently, 
at least one of the numbers H;;(0) (i,j = m . m) is nonzero. By a linear 


change of variable we can bring the form z u’u) H;;(0) to diagonal form, 
i,j=r 
and so we may assume that H,,(0) Æ 0 in Eq. (8.133). By the continuity 
of the functions H;;(u) the inequality H,,(u) # 0 will also hold in some 
neighborhood of u = 0. 
Let us set p(ul,...,uw™”) = /|H;,,(u)|. Then the function y belongs to 
the class C (Ug; R) in some neighborhood U2 C U; of u = 0. We now change 


to coordinates (v',...,v’) by the formulas 
v=u', i ji r, 
r 1 pe a 8.134 


i>r 
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The Jacobian of the transformation (8.134) at u = 0 is obviously equal 
to W(0), that is, it is nonzero. Then by the inverse function theorem we can 
assert that in some neighborhood U3 C U2 of u = 0 the mapping v = y(u) 
defined by (8.134) is a diffeomorphism of class C“)(U3;R™) and therefore 
the variables (v,...,v’") can indeed serve as coordinates of points in U3. 

We now separate off in Eq. (8.133) all terms 


uu" H,,(ul,...,u™) +2 ` u'w H,;(u',...,u™) , (8.135) 
j=r+1 


containing u”. In the expression (8.135) for the sum of these terms we have 
used the fact that Hj; = Hj;. 
Comparing (8.134) and (8.135), we see that we can rewrite (8.135) in the 


form i ; 
tyu"y" — ( uê H; ex) ; 
Hyp 2 A ) 


The ambiguous sign + appears in front of v’v" because H,, = +(w)?, the 
positive sign being taken if H,, > 0 and the negative sign if Hyr < 0. 

Thus, after the substitution v = y(u), the expression (8.133) becomes the 
equality 


r 


(foyow*)(v) = ` E: (v*)?] F DP v'v) Hi; (vt, ...,0™) ; 


i=1 t, Jr 


where Hi; are new smooth functions that are symmetric with respect to the 
indices i and j. The mapping poy”! is a diffeomorphism. Thus the induction 
from r — 1 to r is now complete, and Morse’s lemma is proved. O 


8.6.6 Problems and Exercises 


1. Compute the Jacobian of the change of variable (8.118) from polar coordinates 
to Cartesian coordinates in R”. 


2. a) Let zo be a noncritical point of a smooth function F : U — R defined in 
a neighborhood U of zo = (zG,...,20°) € R™. Show that in some neighborhod 
U CU of xo one can introduce curvilinear coordinates (€",...,€™) such that the 
set of points defined by the condition F(x) = F(xo) will be given by the equation 
€™ = 0 in these new coordinates. 


b) Let y, y € C“)(D;R), and suppose that (ve) = 0) => (va) = 0) in the 
domain D. Show that if grad y Æ 0, then there is a decomposition ~ = 0- y in D, 
where 0 € C- (D; R). 


3. Let f : R? > R? be a smooth mapping satisfying the Cauchy-Riemann equa- 


tions 
af! _ af? af __ of” 


ðr! = Ox?’ Ox? Ox ` 
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a) Show that the Jacobian of such a mapping is zero at a point if and only if 
f'(x) is the zero matrix at that point. 

b) Show that if f'(x) Æ 0, then the inverse fT} to the mapping f is defined in 
a neighborhood of f and also satisfies the Cauchy—Riemann equations. 


4. Functional dependence (direct proof). 

a) Show that the functions 7*(x) = x’ (i = 1,...,m), regarded as functions of 
the point z = (z’,...,2”") € R”, form an independent system of functions in a 
neighborhood of any point of R”. 

b) Show that, for any function f € C(R™;R) the system 7’,...,7™, f is func- 
tionally dependent. 

c) If the system of smooth functions f1,..., fF, k < m, is such that the rank 
of the mapping f = (f',..., fF) equals k at a point zo = (x4,..., 2%") € R”, then 
in some neighborhood of this point one can complete it to an independent system 
f',...,f consisting of m smooth functions. 


d) If the system 
E = fi(2’,...,2) (P= Ine) 


of smooth functions is such that the mapping f = (f',..., f™) has rank m at the 
point zo = (xå,..., 2G"), then the variables (€',...,€) can be used as curvilinear 
coordinates in some neighborhood U (xo) of xo, and any function y : U (zo) > R 


can be written as y(x) = F(f1(2), a si" @)); where F = yo f7’. 

e) The rank of the mapping provided by a system of smooth functions is also 
called the rank of the system. Show that if the rank of a system of smooth functions 
f'(a’,...,2™) (i = 1,...,k) is k and the rank of the system f’,..., f,y is also 
k at some point zo E€ R”, then y(x) = F( F (2), ee f*(z)) in a neighborhood of 


the point. 
Hint: Use c) and d) and show that 


Pent) SFP raat) 
5. Show that the rank of a smooth mapping f : R™ — R” is a lower semicontinuous 
function, that is rank f(x) > rank f(zo) in a neighborhood of a point zo € R”. 


6. a) Give a direct proof of Morse’s lemma for functions f : R > R. 


b) Determine whether Morse’s lemma is applicable at the origin to the following 
functions: 


? 


R |= 


f(z) =2°; f(x) = sin = f(a) e712" sin? 
f(,y)=2°—3ay?; fæ) = 2". 


c) Show that nondegenerate critical points of a function f € C@)(R™;R) are 
isolated: each of them has a neighborhood in which it is the only critical point of f. 


d) Show that the number k of negative squares in the canonical representation 
of a function in the neighborhood of a nondegenerate critical point is independent 
of the reduction method, that is, independent of the coordinate system in which the 
function has canonical form. This number is called the indez of the critical point. 
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8.7 Surfaces in R” and the Theory 
of Extrema with Constraint 


To acquire an informal understanding of the theory of extrema with con- 
straint, which is important in applications, it is useful to have some elemen- 
tary information on surfaces (manifolds) in R”. 


8.7.1 k-Dimensional Surfaces in R” 


Generalizing the concept of a law of motion of a point mass x = x(t), we have 
previously introduced the concept of a path in R” as a continuous mapping 
I’: I + R” of an interval J C R. The degree of smoothness of the path was 
defined as the degree of smoothness of this mapping. The support (J) c R” 
of a path can be a rather peculiar set in R”, which it would be a great stretch 
to call a curve in some instances. For example, the support of a path might 
be a single point. 

Similarly, a continuous or smooth mapping f : J* + R” of a k-dimensional 
interval I* C R¥*, called a singular k-cell in R”, may have as its image f(I*) 
not at all what one would like to call a k-dimensional surface in R”. For 
example, it might again be simply a point. 

In order for a smooth mapping f : G —> R” of a domain G c R* to 
define a k-dimensional geometric figure in R” whose points are described by 
k independent parameters (t!,...,t*) € G, it suffices, as we know from the 
preceding section, to require that the rank of the mapping f : G > R” be k at 
each point t € G (naturally, k < n). In that case the mapping f : G > f(G) 
is locally one-to-one (that is, in a neighborhood of each point t € G). 

Indeed, suppose rank f (to) = k and this rank is realized, for example, on 
the first k of the n functions 


secon eee: (8.136) 
pS TO E 


that define the coordinate expressions for the mapping f : G > R”. 

| Then, by the inverse function theorem the variables t!,...,t* can be ex- 
pressed in terms of z!,...,xz* in some neighborhood U (to) of to. It follows 

that the set f(U(to)) can be written as 


Po Sst (Dice oe gh a OG ewe) 


(that is, it projects in a one-to-one manner onto the coordinate plane of 
z!,..., x£"), and therefore the mapping f : U(to) > f(U (to)) is indeed one- 
to-one. 

However, even the simple example of a smooth one-dimensional path 
(Fig. 8.9) makes it clear that the local injectivity of the mapping f : G > R” 
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Fig. 8.9. 


from the parameter domain G into R” is by no means necessarily a global 
injectivity. The trajectory may have multiple self-intersections, so that if we 
wish to define a smooth k-dimensional surface in R” and picture it as a set 
that has the structure of a slightly deformed piece of a k-dimensional plane 
(a k-dimensional subspace of R”) near each of its points, it is not enough 
merely to map a canonical piece G C R* of a k-dimensional plane in a reg- 
ular manner into R”. It is also necessary to be sure that it happens to be 
globally imbedded in this space. 


Definition 1. We shall call a set S C R” a k-dimensional smooth surface in 
R” (or a k-dimensional submanifold of R”) if for every point xo E€ S there 
exist a neighborhood U (zo) in R” and a diffeomorphism y : U (zo) > I” of 
this neighborhood onto the standard n-dimensional cube J” = {t € R”| |t*| < 
1, i = 1,...,n} of the space R” under which the image of the set SNU (xo) is 
the portion of the k-dimensional plane in R” defined by the relations t*+! = 
0,...,t” = 0 lying inside I” (Fig. 8.10). 


U (xo) 


Fig. 8.10. 
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We shall measure the degree of smoothness of the surface S by the degree 
of smoothness of the diffeomorphism g. 

If we regard the variables t',...,¢” as new coordinates in a neighborhood 
of U (xo), Definition 1 can be rewritten briefly as follows: the set S Cc R” is 
a k-dimensional surface (k-dimensional submanifold) in R” if for every point 
zo € S there is a neighborhood U (zo) and coordinates t',...,¢” in U(z9) 
such that in these coordinates the set S N U (xo) is defined by the relations 


get... =e = 0. 


The role of the standard n-dimensional cube in Definition 1 is rather 
artificial and approximately the same as the role of the standard size and 
shape of a page in a geographical atlas. The canonical location of the interval 
in the coordinate system t!,...,t” is also a matter of convention and nothing 
more, since any cube in R” can always be transformed into the standard 
n-dimensional cube by an additional linear diffeomorphism. 

We shall often use this remark when abbreviating the verification that a 
set S C R” is a surface in R”. 

Let us consider some examples. 


Example 1. The space R” itself is an n-dimensional surface of class C(%). As 
the mapping y : R” — I” here, one can take, for example, the mapping 


2 
& = z arctan z’ (T=) 


Example 2. The mapping constructed in Example 1 also establishes that the 
subspace of the vector space R” defined by the conditions z*t! = ---= g” = 
0 is a k-dimensional surface in R” (or a k-dimensional submanifold of R”). 


Example 3. The set in R” defined by the system of relations 


aiz! +- +alr! +al rt! +- ale” = 0, 


n—k,1 n—k „k n—k „m„k+1 n—=k pn _ 
ay T te +a, 2° + ay yee teetan x0, 


provided this system has rank n — k, is a k-dimensional submanifold of R”. 
Indeed, suppose for example that the determinant 


1 

ûk+1 an 
n—k n—k 
Ans ° an 


is nonzero. Then the linear transformation 
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i 9 
me 
t+ = alz! +- +alr”, 
tr = ag! doeii Qn egr ; 
is obviously nondegenerate. In the coordinates t!,...,t” the set is defined by 
the conditions tft? =... = t” = 0, already considered in Example 2. 


Example 4. The graph of a smooth function x” = f(z1,...,2"~!) defined in 
a domain G C R™~! is a smooth (n — 1)-dimensional surface in R”. 
Indeed, setting 


t =r (é=1,...,n—-1), 


aafaa 


we obtain a coordinate system in which the graph of the function has the 
equation t” = 0. 


Example 5. The circle z? +y? = 1 in R? is a one-dimensional submanifold of 
R?, as is established by the locally invertible conversion to polar coordinates 
(p,p) studied in the preceding section. In these coordinates the circle has 
equation p = 1. 


Example 6. This example is a generalization of Example 3 and at the same 
time, as can be seen from Definition 1, gives a general form for the coordinate 
expression of submanifolds of R”. 


Let F°(z!,...,2") (i =1,...,n—k) be a system of smooth functions of 
rank n — k. We shall show that the relations 
POG heat a ae ee 
A E TE A EET EE T (8.137) 
POEN Genk yh a = 


define a k-dimensional submanifold S in R”. 
Suppose the condition 


ee ere (xo) £0 (8.138) 
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holds at a point x9 € S. Then by the inverse function theorem the transfor- 


mation l l 
ie re CEE es 


CaF athea I Ek khea) 


is a diffeomorphism of a neighborhood of this point. 
In the new coordinates t!,...,t” the original system will have the form 
ttti = ... = t” = 0; thus, S is a k-dimensional smooth surface in R”. 


Example 7. The set E of points of the plane R? satisfying the equation x? — 
y? = 0 consists of two lines that intersect at the origin. This set is not a one- 
dimensional submanifold of R? (verify this!) precisely because of this point 
of intersection. 

If the origin 0 € R? is removed from E, then the set E \ 0 will now 
obviously satisfy Definition 1. We remark that the set E \0 is not connected. 
It consists of four pairwise disjoint rays. 

Thus a k-dimensional surface in R” satisfying Definition 1 may happen 
to be a disconnected subset consisting of several connected components (and 
these components are connected k-dimensional surfaces). A surface in R” is 
often taken to mean a connected k-dimensional surface. Just now we shall be 
interested in the problem of finding extrema of functions defined on surfaces. 
These are local problems, and therefore connectivity will not manifest itself 
in them. 


Example 8. If a smooth mapping f : G — R” of the domain G C R” defined 
in coordinate form by (8.136) has rank k at the point tọ € G, then there 
exists a neighborhood U (tọ) C G of this point whose image f (U (to)) C R” 
is a smooth surface in R”. 

Indeed, as already noted above, in this case relations (8.136) can be re- 
placed by the equivalent system 


E E eT ee (8.139) 


in some neighborhood U (to) of to € G. (For simplicity of notation, we assume 
that the system f!,..., f* has rank k.) Setting 


Fi(c',..., 2") =a*** — p**1(2',..., 0°) Qala — Fh) s 


we write the system (8.139) in the form (8.137). Since relations (8.138) are sat- 
isfied, Example 6 guarantees that the set f (U (to)) is indeed a k-dimensional 
smooth surface in R”. 
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In studying the law of motion x = x(t) of a point mass in RÌ, starting from 


the relation 
a(t) =2(0)+2'(0)t+o0(t) as t0 (8.140) 


and assuming that the point t = 0 is not a critical point of the mapping 
R > t> z(t) € R3, that is, x'(0) Æ 0, we defined the line tangent to the 
trajectory at the point z(0) as the linear subset of R given in parametric 


form by the equation 
x — xo = x'(0)t (8.141) 


or the equation 
L-X=E-t, (8.142) 


where zo = x(0) and € = 2’(0) is a direction vector of the line. 

In essence, we did a similar thing in defining the tangent plane to the 
graph of a function z = f(x,y) in RÌ. Indeed, supplementing the relation 
z = f(x,y) with the trivial equalities x = x and y = y, we obtain a mapping 
R? > (x,y) + (x,y, f(z, y)) € R? to which the tangent at the point (xo, yo) 
is the linear mapping 


£ — Xo 1 0 ar 
Y— yo | = 0 1 ee (8.143) 
z— x fz (20, Yo) fy(£o, Yo) 


where zo = f(Zo, Yo). 

Setting t = (x — x0,y — yo) and x = (x — x0, y — yo, Z — zo) here, and 
denoting the Jacobi matrix in (8.143) for this transformation by 2’(0), we 
remark that its rank is two and that in this notation relation (8.143) has the 
form (8.141). 

The peculiarity of relation (8.143) is that only the last equality in the set 
of three equalities 


Tt — to = T — To, 
Y-Y =Y- Y, | (8.144) 
z — 20 = fz(£0,Yo)(x£ — Lo) + fy (Zo, y — 0) (y — Yo) , 


to which it is equivalent is a nontrivial relation. That is precisely the reason 
it is retained as the equation defining the plane tangent to the graph of 
z = f(x,y) at (Xo, yo, 20). 

This observation can now be used to give the definition of the k- 
dimensional plane tangent to a k-dimensional smooth surface S C R”. 

It can be seen from Definition 1 of a surface that in a neighborhood of 
each of its points x9 € S a k-dimensional surface S can be defined para- 
metrically, that is, using mappings IF > (t!,..., tf) 4 (z!,..., £") € S. 
Such a parametrization can be taken to be the restriction of the mapping 
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y-! : I” — U(zo) to the k-dimensional plane t*+1 = ... = t” = 0 (see 
Fig. 8.10). 
Since y~! is a diffeomorphism, the Jacobian of the mapping y~! : I” > 


U(zo) is nonzero at each point of the cube J”. But then the mapping I* > 
(tt... t£) +> (z!,...,2”) € S obtained by restricting y~! to this plane must 
also have rank k at each point of I*. 

Now setting (t!,...,¢*) =t € J* and denoting the mapping I" > t => z € 
S by x = x(t), we obtain a local parametric representation of the surface S 
possessing the property expressed by (8.140), on the basis of which we take 
Eq. (8.141) as the equation of the tangent space or tangent plane to the 
surface S C R” at zo E S. 

Thus we adopt the following definition. 


Definition 2. If a k-dimensional surface S C R”, 1 < k < n, is defined 
parametrically in a neighborhood of x9 € S by means of a smooth mapping 
(1,...,t%) =t x = (z!,..., £”) such that zọ = z(0) and the matrix z'(0) 
has rank k, then the k-dimensional surface in R” defined parametrically by 
the matrix equality (8.141) is called the tangent plane or tangent space to the 
surface S at £to € S. 


In coordinate form the following system of equations corresponds to Eq. 
(8.141): 


ðr! Olsi 

x! = a = zr Or a gr Ot f 

E A ET AEE (8.145) 
Ox Ox” 


We shall denote! the tangent space to the surface S at x € S, as before, by 
Lon: 

An important and useful exercise, which the reader can do independently, 
is to prove the invariance of the definition of the tangent space and the 
verification that the linear mapping t +> 2’(0)t tangent to the mapping t +> 
z(t), which defines the surface S locally, provides a mapping of the space 
R* = TR§ onto the plane T'S,,9) (see Problem 3 at the end of this section). 

Let us now determine the form of the equation of the tangent plane to the 
k-dimensional surface S defined in R” by the system (8.137). For definiteness 
we shall assume that condition (8.138) holds in a neighborhood of the point 
zo ES. 

Setting That Se a oe aS de") ao, We 
write the system (8.137) in the form 


F(u,v) =0, | (8.146) 


10 This is a slight departure from the usual notation TzS or T;(S). 
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and (8.138) as 
det F' (u,v) #0. (8.147) 
Using the implicit function theorem, in a neighborhood of the point 
(uo, vo) = (xd,..., 2%, a6*',...,22) we pass from relation (8.146) to the 
equivalent relation 
v = f(u), (8.148) 


which, when we supplement it with the identity u = u, yields the parametric 
representation of the surface S in a neighborhood of zo € S: 


-a (8.149) 
v = f(u). 


On the basis of Definition 2 we obtain from (8.149) the parametric equa- 
tion 
u-—u =E.-t, 
(8.150) 
V—V = f'(uo) et 
of the tangent plane; here E is the identity matrix and t = u — uo. _ 
Just as was done in the case of the system (8.144), we retain in the system 
(8.150) only the nontrivial equation 


v — vo = f'(uo)(u — uo) , (8.151) 
which contains the connection of the variables z!,...,2* with the variables 
g*t1 ... x” that determine the tangent space. 


Using the relation 
=i 
f' (uo) = — |F; (uo, vo)] [Fu (u0, vo)] , 
which follows from the implicit function theorem, we rewrite (8.151) as 


F‘ (uo, vo)(u — uo) + F; (uo, vo)(v — vo) = 0, 


from which, after returning to the variables (x!,..., £”) = x, we obtain the 
equation we are seeking for the tangent space T'S,, C R”, namely 
F’ (xo)(£ — zo) = 0. (8.152) 


In coordinate representation Eq. (8.152) is equivalent to the system of 
equations 


OF! OF" tii 
Har (to) (a — z0) ++ Dan (to)(z" — 2g) =0, 
ee ee eee ee er eee tr ee her eet (8.153) 
apet ss 
zl (xo)(x* — z0) + +++ + apn (zo)(z" — zg) = 0 
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By hypothesis the rank of this system is n — k, and hence it defines a 
k-dimensional plane in R”. 
The affine equation (8.152) is equivalent (given the point zo) to the vector 
equation 
F(a) -€=0, (8.154) 


in which € = x — Zo. . 

Hence the vector £ lies in the plane T Sz, tangent at ro E€ S to the surface 
S C R” defined by the equation F(x) = 0 if and only if it satisfies condition 
(8.154). Thus TS, can be regarded as the vector space consisting of the 
vectors € that satisfy (8.154). 

It is this fact that motivates the use of the term tangent space. 

Let us now prove the following proposition, which we have already en- 
countered in a special case (see Sect. 6.4). 


Proposition. The space T Sz, tangent to a smooth surface S C R” at a point 
xo E S consists of the vectors tangent to smooth curves lying on the surface 
S and passing through the point xo. 


Proof. Let the surface S be defined in a neighborhood of the point zo € S 
by a system of equations (8.137), which we write briefly as 


F(x) =0, (8.155) 


where F = (F!,...,F"—*), x = (z!,...,2"). Let I:I > S be an arbitrary 
smooth path with support on S. Taking J = {t € R| |t| < 1}, we shall assume 
that «(0) = zo. Since x(t) € S for t € I, after substituting x(t) into Eq. 
(8.155), we obtain 

F(z(t)) =0 (8.156) 


for t € I . Differentiating this identity with respect to t, we find that 
Fy (x(t)) -2’(t) =0. 
In particular, when t = 0, setting € = x’(0), we obtain 
Fi (zo) =0, 


that is, the vector € tangent to the trajectory at xo (at time t = 0) satisfies 
Eq. (8.154) of the tangent space T Szo- 

We now show that for every vector £ satisfying Eq. (8.154) there exists 
a smooth path [’: I —> S that defines a curve on S passing through xo at 
t = 0 and having the velocity vector € at time t = 0. 

This will simultaneously establish the existence of smooth curves on S$ 
passing through zo, which we assumed implicitly in the proof of the first part 
of the proposition. 

Suppose for definiteness that condition (8.138) holds. Then, knowing the 
first k coordinates €1,...,€* of the vector € = (€1,...,€*,€*t1,...,€7), 
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we determine the other coordinates €**!,...,€" uniquely from Eq. (8.154) 
(which is equivalent to the system (8.153)). Thus, if we establish that a vec- 
tor € = (€1,...,€*,€**1,...,€") satisfies Eq. (8.154), we can conclude that 
E€ = £. We shall make use of this fact. 

Again, as was done above, we introduce for convenience the notation 
EE eee Sa, te”) (a bee) = (aw) and Fa) = 
F(u, v). Then Eq. (8.155) will have the form (8.146) and condition (8.138) will 
have the form (8.147). In the subspace R! C R?” of the variables z',...,x* 
we choose a parametrically defined line 


z! — x6 eg 


ere eee teR, 
rë — rk Ere. 
having direction vector (£+,...,€*), which we denote £„. In more abbreviated 
notation this line can be written as 
u = uo + Êut . (8.157) 


Solving Eq. (8.146) for v, by the implicit function theorem we obtain a 
smooth function (8.148), which, when the right-hand side of Eq. (8.157) is 
substituted as its argument and (8.157) is taken account of, yields a smooth 
curve in R” defined as follows: 


u = uo + yt, 
tEUMER. (8.158) 
Com f (uo + Eut) ’ 


Since F (u, f(u)) = 0, this curve obviously lies on the surface S. Moreover, 
it is clear from Eqs. (8.158) that at t = 0 the curve passes through the point 
(ug to) = (hacr ao ises th) Sao ES. 

Differentiating the identity 


F (u(t), v(t)) = F (uo + £ut, f (uo + £ut)) =0 


with respect to t, we obtain for t = 0 


~ 


Fa (U0, Vo )&u T F, (uo, Vo )&u =0 ’ 


where Eu = v'(0) = es, yek i”), This equality shows that the vector Ê = 
(Eus Ev) = (€7,...,€",€**7,...,€”) satisfies Eq. (8.154). Thus by the remark 
made above, we conclude that € = €. But the vector € is the velocity vector 
at t = 0 for the trajectory (8.158). The proposition is now proved. O 


8.7 Surfaces in R” and Constrained Extrema 527 


8.7.3 Extrema with Constraint 


a. Statement of the Problem One of the most brilliant and well-known 
achievements of differential calculus is the collection of recipes it provides 
for finding the extrema of functions. The necessary conditions and sufficient 
differential tests for an extremum that we obtained from Taylor’s theorem 
apply, as we have noted, to interior extrema. 

In other words, these results are applicable only to the study of the be- 
havior of functions R” > x +> f(x) € R ina neighborhood of a point zo € R”, 
when the argument x can assume any value in some neighborhood of Zo in 
IR”. 

Frequently a situation that is more complicated and from the practical 
point of view even more interesting arises, in which one seeks an extremum of 
a function under certain constraints that limit the domain of variation of the 
argument. A typical example is the isoperimetric problem, in which we seek 
a body of maximal volume subject to the condition that its boundary surface 
has a fixed area. To obtain a mathematical expression for such a problem 
that will be accessible to us, we shall simplify the statement and assume that 
the problem is to choose from the set of rectangles having a fixed perimeter 
2p the one having the largest area o. Denoting the lengths of the sides of the 
rectangle by x and y, we write 


o(z,h)=a2-y, 
L+y=p. 


Thus we need to find an extremum of the function o(z,y) under the 
condition that the variables x and y are connected by the equation x+y = p. 
Therefore, the extremum is being sought only on the set of points of R? 
satisfying this relation. This particular problem, of course, can be solved 
without difficulty: it suffices to write y = p — x and substitute this expression 
into the formula for o(z, y), then find the maximum of the function x(p — x) 
by the usual methods. We needed this example only to explain the statement 

of the problem itself. | 
In general the problem of an extremum with constraint usually amounts 
to finding an extremum for a real-valued function 


y = f(x!,..., x£") (8.159) 


of n variables under the condition that these variables must satisfy a system 
of equations 


ee EA E A (8.160) 
P bhaa E= 
Since we are planning to obtain differential conditions for an extremum, 


we shall assume that all these functions are differentiable and even contin- 
uously differentiable. If the rank of the system of functions F!,...,F™ is 
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n — k, conditions (8.160) define a k-dimensional smooth surface S in R”, and 
from the geometric point of view the problem of extremum with constraint 
amounts to finding an extremum of the function f on the surface S. More pre- 
cisely, we are considering the restriction f | g Of the function f to the surface 
S and seeking an extremum of that function. 

The meaning of the concept of a local extremum itself here, of course, 
remains the same as before, that is, a point ro € S is a local extremum of 
f on S, or, more briefly f|,, if there is a neighborhood"? Us(xo) of xo in 
S C R” such that f(x) > f(zo) for any point x € Us(xo) (in which case xo is 
a local minimum) or f(z) < f(xo) (and then zo is a local maximum). If these 
inequalities are strict for x € Ug(zo) \ £o, then the extremum, as before, will 
be called strict. 


b. A Necessary Condition for an Extremum with Constraint 


Theorem 1. Let f: D — R be a function defined on an open set D C R” 
and belonging to C“)(D;R). Let S be a smooth surface in D. 

A necessary condition for a point xo E€ S that is noncritical for f to be a 
local extremum of f | g 28 that 


TS, C TNey , (8.161) 
(0) (0) 


where TSz, is the tangent space to the surface S at xo and TNz, is the 
tangent space to the level surface N = {x € D| f(x) = f(xo0)} of f to which 
Xo belongs. 


We begin by remarking that the requirement that the point zo be non- 
critical for f is not an essential restriction in the context of the problem of 
finding an extremum with constraint, which we are discussing. Indeed, even 
if the point xo € D were a critical point of the function f : D — R or an 
extremum of the function, it is clear that it would still be a possible or ac- 
tual extremum respectively for the function f| g: Thus, the new element in 
this problem is precisely that the function f | g may have criticial points and 
extrema that are different from those of f. 


Proof. We choose an arbitrary vector € € T Sz, and a smooth path x = z(t) 
on S that passes through this point at t = 0 and for which the vector € is 
the velocity at ¢ = 0, that is, 


dx 


OE. (8.162) 


If zo is an extremum of the function f| g the smooth function f (x(t)) 
must have an extremum at t = 0. By the necessary condition for an ex- 
tremum, its derivative must vanish at t = 0, that is, we must have 


11 We recall that Us(zo) = S N U (xo), where U (xo) is a neighborhood of xo in R”. 
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f'(wo)-€=0, (8.163) 


where 
et (are ie 


Since xo is a noncritical point of f, condition (8.163) is equivalent to the 
condition that  € TN,,, for relation (8.163) is precisely the equation of the 
tangent space T’N,,. 

Thus we have proved that T Sz, CT'N,,. O 


If the surface S is defined by the system of equations (8.160) in a neigh- 
borhood of xo, then the space T'S,,, as we know, is defined by the system of 
linear equations 


OF} 1 OF} n 
Dai Z098 e a (xo )E" =0, 
PEE E EE ar TEEN ban cei eee (8.164) 
OF™ 1 OFr™ ou 
The space T Nz, is defined by the equation 
Of 4 OF 
Fat (#0)! ea a (to)E” =0, (8.165) 


and, since every solution of (8.164) is a solution of (8.165), the latter equation 
is a consequence of (8.163). 

It follows from these considerations that the relation T’S;, C T Nz, is 
equivalent to the analytic statement that the vector grad f(xo) is a linear 
combination of the vectors grad F° (xo), (i = 1,...,m), that is, 


grad f (xo) 3 A;igrad F’ (zo). (8.166) 


t= 1 


Taking account of this way of writing the necessary condition for an ex- 
tremum of a function (8.159) whose variables are connected by (8.160), La- 
grange proposed using the following auxiliary function when seeking a con- 
strained extremum: 


L(x, A) = f(x) - -JN F(x (8.167) 


in n+ m variables (x, A) = (x1,...,2™,A1,.--,An)- 
This function is called the Lagrange function and the method of using it 
is the method of Lagrange multipliers. 
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The function (8.167) is convenient because the necessary conditions for 


an extremum of it, regarded as a function of (x, A) = (z?,...,2,A1,---,An); 
are precisely (8.166) and (8.160). 
Indeed, 
OL 
E S z) =0 (j = 1...1), 
(8.168) 
ðL, ; l 
D REE CEER E 


Thus, in seeking an extremum of a function (8.159) whose variables are 
subject to the constraints (8.160), one can write the Lagrange function (8.167) 
with undetermined multipliers and look for its critical points. If it is pos- 
sible to find ro = (xġ,..., £2) from the system (8.168) without finding 
A = (à1,..., Am), then, as far as the original problem is concerned, that 
is what should be done. 

As can be seen from (8.166), the multipliers A; (i = 1,...,m) are uniquely 
determined if the vectors grad F? (xo) (i = 1,...,m) are linearly independent. 
The independence of these vectors is equivalent to the statement that the rank 
of the system (8.164) is m, that is, that all the equations in this system are 
essential (none of them is a consequence of the others). 

This is usually the case, since it is assumed that all the relations (8.160) 
are independent, and the rank of the system of functions F1,..., F™ is m at 
every point x € X. 

The Lagrange function is often written as 


L(x, A) = f(x) + De Ai F’ (2) 


which differs from the preceding expression only in the inessential replace- 
ment of A; by —A;.1? 


Example 9. Let us find the extrema of a symmetric quadratic form 
n 
t)= X agrs? (aij = aji) (8.169) 
i,j=1 


on the sphere 


r) = Siri) —1=0. (8.170) 


12 In regard to the necessary criterion for an extremum with constraint, see also 
Problem 6 in Sect. 10.7 (Part 2). 
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Let us write the Lagrange function for this problem 


L(x, A) = 5 aijt xi — (De — 1) l 


i, j=1 i=1 


and the necessary conditions for an extremum of L(x, A), taking account of 
the relation aij = aji: 
OL n : i 
g EA) = 2( È aye -azi ) =! Cehe) 
(8.171) 

ðL AAP. 
— (x, à) = ‘21 |= 0. 

Multiplying the first equation by zê and summing the first relation over 
i, we find, taking account of the second relation, that the equality 


` aijzx'tI —X=0 (8.172) 


2,j=1 


must hold at an extremum. 
The system (8.171) minus the last equation can be rewritten as 


N azz? = dx" hesen) s (8.173) 
i=1 


from which it follows that A is an eigenvalue of the linear operator A defined 
by the matrix (a;;), and x = (z',...,2”) is an eigenvector of this operator 
corresponding to this eigenvalue. 

Since the function (8.169), which is continuous on the compact set S = 


n . 
fa E R”| X @) = i}, must assume its maximal value at some point, the 
| i=1 


system (8.171), and hence also (8.173), must have a solution. Thus we have 
_ established along the way that every real symmetric matrix (a;;) has at least 
one real eigenvalue. This is a result well-known from linear algebra, and is 
fundamental in the proof of the existence of a basis of eigenvectors for a 
symmetric operator. 

To show the geometric meaning of the eigenvalue A, we remark that if 
À > 0, then, passing to the coordinates tê = xt/vVA we find, instead of 
(8.172), 


n 
` aijt tI =l ; (8.174) 
i j=l 


and, instead of (8.170), 
n l 1 
St = v (8.175) 


i=1 
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n . 
But 5 (tê)? is the square of the distance from origin to the point 


i=1 

t = (t!,...,t”), which lies on the quadric surface (8.174). Thus if (8.174) 
represents, for example, an ellipsoid, then the reciprocal 1/2 of the eigen- 
value A is the square of the length of one of its semi-axes. 

This is a useful observation. It shows in particular that relations (8.171), 
which are necessary conditions for an extremum with constraint, are still not 
sufficient. After all, an ellipsoid in R3 has, besides its largest and smallest 
semi-axes, a third semi-axis whose length is intermediate between the two, in 
any neighborhood of whose endpoint there are both points nearer to the origin 
and points farther away from the origin than the endpoint. This last becomes 
completely obvious if we consider the ellipses obtained by taking a section of 
the original ellipsoid by two planes passing through the intermediate-length 
semi-axis, one passing through the smallest semi-axis and the other through 
the largest. In one of these cases the intermediate axis will be the major semi- 
axis of the ellipse of intersection. In the other it will be the minor semi-axis. 

To what has just been said we should add that if 1/ VA is the length of this 
intermediate semi-axis, then, as can be seen from the canonical equation of 
an ellipsoid, A will be an eigenvalue of the operator A. Therefore the system 
(8.171), which expresses the necessary conditions for an extremum of the 
function f | g» will indeed have a solution that does not give an extremum of 
the function. 

The result obtained in Theorem 1 (the necessary condition for an ex- 
tremum with constraint) is illustrated in Fig. 8.1la and 8.11b. 


C1 < Co < C2 


Fig. 8.11. 


The first of these figures explains why the point xo of the surface S cannot 
be an extremum of f | g if S is not tangent to the surface N = {x € R”| f(x) = 
f (xo) = co} at xo. It is assumed here that grad f (£o) # 0. This last condition 
guarantees that in a neighborhood of xo there are points of a higher, c2-level 
surface of the function f and also points of a lower, c,-level surface of the 
function. 

Since the smooth surface S intersects the surface N, that is, the co-level 
surface of the smooth function f, it follows that S will intersect both higher 
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and lower level surfaces of f in a neighborhood of x9. But this means that 
zo cannot be an extremum of f | & 

The second figure shows why, when N is tangent to S at xo, this point 
may turn out to be an extremum. In the figure Zo is a local maximum of f | g 

These same considerations make it possible to sketch a picture whose 
analytic expression can show that the necessary criterion for an extremum is 
not sufficient. 

Indeed, in accordance with Fig. 8.12, we set, for example, 


Taag Pey =r 7H: 


Fig. 8.12. 


It is then obvious that y has no extremum at the point (0,0) on the curve 
S C R? defined by the equation y = z, even though this curve is tangent 
to the level line f(x,y) = 0 of the function f at that point. We remark that 
grad f (0,0) = (0,1) # 0. | 

It is obvious that this is essentially the same example that served earlier 
to illustrate the difference between the necessary and sufficient conditions for 
a classical interior extremum of a function. 


c. A Sufficient Condition for a Constrained Extremum We now prove 
the following sufficient condition for the presence or absence of a constrained 
extremum. 


Theorem 2. Let f : D —> R be a function defined on an open set D C R” 
and belonging to the class C) (D; R); let S be the surface in D defined by 
Eqs. (8.160), where F* € C? (D; R) (i =1,...,m) and the rank of the system 
of functions {F1,...,F'™} at each point of D ism. 

Suppose that the parameters \1,...,Am in the Lagrange function 


L(x) = L(x; A) = f(2",..., 2") — SF i(e!,...,2") 
i=1 


have been chosen in accordance with the necessary criterion (8.166) for a 
constrained extremum of the function f | g atxoES A3 


13 When we keep 2 fixed, L(x; A) becomes a function depending only on 2; we allow 
ourselves to denote this function L(x). 
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A sufficient condition for the point xo to be an extremum of the function 
T | g Ís that the quadratic form 


0? L 
Ox OxI 


be either positive-definite or negative-definite for vectors E E€ TS,,. 

If the quadratic form (8.176) is positive-definite on TS,,, then xo is a 
strict local minimum of f | g; Uf it is negative-definite, then Xo is a strict local 
maximum. 

A sufficient condition for the point xo not to be an extremum of f | g Í$ 
that the form (8.176) assume both positive and negative values on TS,,. 


(Toe E (8.176) 


Proof. We first note that L(x) = f(x) for x € S, so that if we show that xo € 
S is an extremum of the function L| g, we shall have shown simultaneously 
that it is an extremum of f | gi 

By hypothesis, the necessary criterion (8.166) for an extremum of f l 
at xo is fulfilled, so that grad L(xo) = O at this point. Hence the Taylor 


expansion of L(x) in a neighborhood of xo = (x},..., x2) has the form 
L(x) — L(x) = 12 og \(xê — xb) (zi — xf) + o(||£ — zoll?) (8.177) 
2! axiri Y O : i 
as £ — Xo. 


We now recall that, in motivating Definition 2, we noted the possibility 
of a local (for example, in a neighborhood of zo € S) parametric definition 
of a smooth k-dimensional surface S (in the present case, k = n — m). 

In other words, there exists a smooth mapping 


R > (¢,...,¢°) =t z =(c',...,2") E€ R” 


(as before, we shall write it in the form x = x(t)), under which a neighborhood 
of the point 0 = (0,...,0) € RF maps bijectively to some neighborhood of £o 
on S, and xp = (0). ~ 

We remark that the relation 


z(t) — x(0) = 2'(0)t + o(|lé||) ast 30, 


which expresses the differentiability of the mapping t +> z(t) at t = 0, is 
equivalent to the n coordinate equalities 
i i OF pnd 
x(t) — x’ (0) = zza (O)t +o(|lél])  G@=1,...,7), (8.178) 
in which the index a@ ranges over the integers from 1 to k and the summation 
is over this index. 
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It follows from these numerical equalities that 
jx*(t) — x*(0)| = O(|le||) as t > 0 


and hence 
e(t) — z(0)llr- = O(|lellR«) ast +0. (8.179) 


Using relations (8.178), (8.179), and (8.177), we find that as t > 0 
1 , 
L(x(t)) — L(x(0)) = zj 015 L (220) ax" (0) Ogax(0)t*t" + o(|l¢||?).(8.177’) 


Hence under the assumption of positive- or negative-definiteness of the 


form . l 
ði L(x9)Oqx"  (0)ðgrI (0)t°t? (8.180) 


it follows that the function L(x(t)) has an extremum at t = 0. If the form 
(8.180) assumes both positive and negative values, then L(z(t)) has no ex- 
tremum at t = 0. But, since some neighborhood of the point 0 € R* maps 
to a neighborhood of x(0) = xp E€ S on the surface S under the mapping 
t +» x(t), we can conclude that the function L|; also will either have an 
extremum at xo of the same nature as the function L(x(t)) or, like L(x(t)), 
will not have an extremum. 

Thus, it remains to verify that for vectors € € T Sz, the expressions (8.176) 
and (8.180) are merely different notations for the same object. 

Indeed, setting 

f= x'(0)t ’ 


we obtain a vector € tangent to S at xo, and if € = (€',...,€"), z(t) = 
(x1,...,2")(¢), and t = (t',...,¢*), then 


from which it follows that the quantities (8.176) and (8.180) are the same. 
O 


We note that the practical use of Theorem 2 is hindered by the fact that 
only k = n — m of the coordinates of the vector € = (€',...,€") € TS, 
are independent, since the coordinates of € must satisfy the system (8.164) 
defining the space T Sz,- Thus a direct application of the Sylvester criterion 
to the quadratic form (8.176) generally yields nothing in the present case: 
the form (8.176) may not be positive- or negative-definite on TIRZ, and yet 
be definite on T'S,,. But if we express m coordinates of the vector € in 
terms of the other k coordinates by relations (8.164) and then substitute 
the resulting linear forms into (8.176), we arrive at a quadratic form in k 
variables whose positive- or negative-definiteness can be investigated using 
the Sylvester criterion. 
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Let us clarify what has just been said by some elementary examples. 
Example 10. Suppose we are given the function 
f(x,y,z) = 27 =y +2° 


in the space R with coordinates x, y, z. We seek an extremum of this function 
on the plane S defined by the equation 


F(az,y,z) = 22% -—y-—3=0. 
Writing the Lagrange function 
L(x, y, z) = (x? > y? T z”) z A(2x =o 3) 


and the necessary conditions for an extremum 


a = 20] 2h:= 05 

= = —2y+A=0, 

îr Z2 = 0 

= = —(2x —-y—3)=0, 


we find the possible extremum p = (2,1, 0). 
Next we find the form (8.176): 
1 ree 
sou Lee’ = (EY - (EY + EY. (8.181) 


We note that in this case the parameter A did not occur in this quadratic 
form, and so we did not compute it. 
We now write the condition € € TS,: 


21- E =0. (8.182) 


From this equality we find €? = 2¢1 and substitute it into the form (8.181), 
after which it assumes the form | 


BEEREN s 


where this time £! and £’ are independent variables. 
This last form may obviously assume both positive and negative values, 
and therefore the function f | g has no extremum at p € S. 
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Example 11. Under the hypotheses of Example 10 we replace RÌ by R? and 
the function f by 
f(z,y)=2* -4 , 
retaining the condition 
2x-—-y—3=0, 
which now defines a line S in the plane R?. 


We find p = (2,1) as a possible extremum. 
Instead of the form (8.181) we obtain the form 


ESE (8.183) 


with the previous relation (8.182) between € and €?. 
Thus the form (8.183) now has the form 


—3(€")? 


on T'S,, that is, it is negative-definite. We conclude from this that the point 
p = (2,1) is a local maximum of Jle: 


The following simple examples are instructive in many respects. On them 
one can distinctly trace the working of both the necessary and the sufficient 
conditions for constrained extrema, including the role of the parameter and 
the informal role of the Lagrange function itself. 


Example 12. On the plane R? with Cartesian coordinates (x,y) we are given 
the function 
f(z,y) =a? +y". 
Let us find the extremum of this function on the ellipse given by the 
canonical relation 


F(x,y) = 


‘where 0 <a < b. 

It is obvious from geometric considerations that min f | g = a? and 
max f | g= b2. Let us obtain this result on the basis of the procedures recom- 
mended by Theorems 1 and 2. 
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By writing the Lagrange function 


r? 2 
Lizy A= (a) i) 


and solving the equation dL = 0, that is, the system 2L = oe = of = 0, we 
find the solutions 


(x,y, A) = (+a,0,a7), (0, b,b). 


Now in accordance with Theorem 2 we write and study the quadratic form 
5d? Lé?, the second term of the Taylor expansion of the Lagrange function 
in a neighborhood of the corresponding points: 


Kra = (1 - Se” + (1 = ZNE 


At the points (+a, 0) of the ellipse S the tangent vector € = (£1, £2) has 
the form (0, €), and for \ = a? the quadratic form assumes the form 


-er 


Taking account of the condition 0 < a < b, we conclude that this form 
is positive-definite and hence at the points (ta,0) € S there is a strict local 


(and in this case obviously also global) minimum of the function f | g» that is, 


min f|, = 0°. 


Similarly we find the form 
b? Vas 
(1 E z) ) ) 
which corresponds to the points (0, +b) € S, and we find max f | g= b2. 


Remark. Note the role of the Lagrange function here compared with the role 
of the function f. At the corresponding points on these tangent vectors the 
differential of f (like the differential of L) vanishes, and the quadratic form 
+d? fE? = (€1)? + (€7)? is positive definite at whichever of these points it is 
computed. Nevertheless, the function f | g has a strict minimum at the points 
(+a,0) and a strict maximum at the points (0, +b). 

To understand what is going on here, look again at the proof of Theorem 
2. and try to obtain relation (8.176’) by substituting f for L in (8.177). Note 
that an additional term containing x’’(0) arises here. The reason it does not 
vanish is that, in contrast to dL the differential df of f is not identically zero 
at the corresponding points, even though its values are indeed zero on the 
tangent vectors (of the form x’(0)). 
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Example 13. Let us find the extrema of the function 
f(z, y,2) = 2? +y +2? 


on the ellipsoid S defined by the relation 


£ 
F(z) = ztl, 


where 0 <a<b<e. 
By writing the Lagrange function 


2 2 z2 


Jh 
L(y A) = (8 t aA +O) 


in accordance with the necessary criterion for an exi reman we find the 


solutions of the equation dL = 0, that is, the system 34 f= Oe = on =();: 


(x,y,z, A) = (£a,0,0, a?) , (0, +b, 0, b°) , (0, 0, te, c°) , 


On each respective tangent plane the quadratic form 
Loree A \ 7 ¢1\2 À \ (422 À \ r32 
Lere (1- A)" + (1-A)er+ (AYO 


in each of these cases has the form 


Since 0 <a <b < c, it follows from Theorem 2, which gives a sufficient 
criterion for the presence or absence of a constrained extremum, that one can 
conclude that in cases (a) and (c), we have found respectively min f| s= a? 
and max f p = c’, while at the points (0, +b,0) € S corresponding to case 
(b) the function f| g has no extremum. This is in complete agreement with 
the obvious geometric considerations stated in the discussion of the necessary 
criterion for a constrained extremum. 


Certain other aspects of the concepts of analysis and geometry encoun- 
tered in this section, which are sometimes quite useful, including the physical 
interpretation of the problem of a constrained extremum itself, as well as the 
necessary criterion (8.166) for it as the resolution of forces at an equilibrium 
point and the interpretation of the Lagrange multipliers as the magnitude of 
the reaction of ideal constraints, are presented in the problems and exercises 
that follow. 
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8.7.4 Problems and Exercises 


1. Paths and surfaces. 


a) Let f : I > R? be a mapping of class C(I; R?) of the interval I C R. 
Regarding this mapping as a path in R?, show by example that its support f (I) may 
fail to be a submanifold of R’, while the graph of this mapping in RÌ = R! x R? is 
always a one-dimensional submanifold of R? whose projection into R° is the support 
fC) of the path. 


b) Solve problem a) in the case when J is an interval in R? and f € C® (I; R”). 
Show that in this case the graph of the mapping f : J — R” is a smooth k- 
dimensional surface in R? x R” whose projection on the subspace R” equals f (I). 


c) Verify that if fi : Iı —> S and fe: I2 —> S are two smooth parametrizations 
of the same k-dimensional surface S C R”, fı having no critical points in J; and 
fe having no critical points in I2, then the mappings 


fi ofe:boh, fof: hook 
are smooth. 


2. The sphere in R”. 


a) On the sphere S° = {x € R*| ||z|| = 1} exhibit a maximal domain of validity 
for the curvilinear coordinates (Y, Y) obtained from polar coordinates in R? (see 
formula (8.116) of the preceding section) when p = 1. 


b) Answer question a) in the case of the (m — 1)-dimensional sphere 
gm" = {x € R™|||z|| = 1} 


in R” and the coordinates (~1,...,m-—1i) on it obtained from polar coordinates in 
R” (see formula (8.118) of the preceding section) at p = 1. 


c) Can the sphere SF c R*t! be defined by a single coordinate system 
(t!,...,t*), that is, a single diffeomorphism f : G — R**’ of a domain G C R*? 

d) What is the smallest number of maps needed in an atlas of the Earth’s 
surface? 


e) Let us measure the distance between points of the sphere S? C R? by the 
length of the shortest curve lying on the sphere $° and joining these points. Such 
a curve is the arc of a suitable great circle. Can there exist a local flat map of the 
sphere such that all the distances between points of the sphere are proportional 
(with the same coefficient of proportionality) to the distances between their images 
on the map? 


f) The angle between curves (whether lying on the sphere or not) at their point 
of intersection is defined as the angle between the tangents to these curves at this 
point. 

Show that there exist local flat maps of a sphere at which the angles between 
the curves on the sphere and the corresponding curves on the map are the same 
(see Fig. 8.13), which depicts the so-called stereographic projection.) 


8.7 Surfaces in R” and Constrained Extrema 541 


Fig. 8.13. 


3. The tangent space. 


a) Verify by direct computation that the tangent manifold TSz, to a smooth 
k-dimensional surface S C R” at a point ro € S is independent of the choice of the 
coordinate system in R”. 


b) Show that if a smooth surface S C D maps to a smooth surface S’ C D’ under 
a diffeomorphism f : D — D’ of the domain D C R” onto the domain D’ C R” and 
the point zo € S maps to xo € S’, then under the linear mapping f’(zo) : R” —> R” 
tangent to f at xo € D the vector space TSzọ maps isomorphically to the vector 
space T'S, . 


c) If under the conditions of the preceding problem the mapping f : D > D’ is 
any mapping of class C} (D; D’) under which f(S) C S$’, then f’(TSz,) C TS’. 
0 Lo 


d) Show that the orthogonal projection of a smooth k-dimensional surface S C 
R” on to the k-dimensional tangent plane T Szo to it at xo € S is one-to-one in 
some neighborhood of the point of tangency Zo. 


e) Suppose, under the conditions of the preceding problem, that € € T Szo and 
él] = 1. 

The equation x—zxo = &t of a line in R” lying in T Sz, can be used to characterize 
each point x € T Szo \ Zo by the pair (t,€). These are essentially polar coordinates 
in TS z,-. 

Show that smooth curves on the surface S intersecting only at the point zo 
correspond to the lines x — xp = &t in a neighborhood of zo. Verify that, retaining 
' t as the parameter on these curves, we obtain paths along which the velocity at 
t = 0 is the vector € € T Sz, that determines the line x — xo = &t, from which the 
given curve on S is obtained. 

Thus the pairs (t,€), where € € T Szo, ||€|| = 1, and t are real numbers from 
some neighborhood U (0) of zero in R, can serve as the analogue of polar coordinates 
in a neighborhood of xo € S. 


4. Let the function F € C™)(R";R) having no critical points be such that the 
equation F(x',...,2”) = 0 defines a compact surface S in R” (that is, S is compact 
as a subset of R”). For any point x € S we find a vector n(x) = grad F(z) normal 
to S at x. If we force each point x € S to move uniformly with velocity n(x), a 
mapping S Ð x > x + n(x)t € R?” arises. 

a) Show that for values of t sufficiently close to zero, this mapping is bijective 
and for each such value of t a smooth surface S; results from S. 
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b) Let E be a set in R”; we define the 6-neighborhood of the set E to be the 
set of points in R” whose distance from F is less than 6. 
Show that for values of t close to zero, the equation 


F(x',... £") =t 


defines a compact surface S; C R”, and show that the surface S; lies in the 6(t)- 
neighborhood of the surface S+, where ô(t) = o(t) as t > 0. 
c) With each point x of the surface S = So we associate a unit normal vector 


ale) — 1) 
w) = Taal 


and consider the new mapping S > x +> x + n(z)t € R”. 
Show that for all values of t sufficiently close to zero this mapping is bijective, 


that the surface S+ obtained from S at the specific value of t is smooth, and if 
tı Æ te, then St, N Se = ø. 

d) Relying on the result of the preceding exercise, show that there exists 
6 > 0 such that there is a one-to-one correspondence between the points of the 
6-neighborhood of the surface S and the pairs (t,x), where t €] — ô, 6[C R, x € S. If 
(t',...,t*) are local coordinates on the surface S in the neighborhood Us (zo) of zo, 
then the quantities (t, t!,...,tF) can serve as local coordinates in a neighborhood 
U (xo) of zo € R”. 

e) Show that for |t| < 6 the point x € S is the point of the surface S closest 


to (x + n(x)t) € R”. Thus for |t| < 6 the surface S; is the geometric locus of the 
points of R” at distance |t| from S. 


5. a) Let dp : S — R be the function on the smooth k-dimensional surface S C R” 
defined by the equality d,(x) = ||p — x||, where p is a fixed point of R”, x is a point 
of S, and ||p — || is the distance between these points in R”. 

Show that at the extrema of the function d,(x) the vector p — x is orthogonal 
to the surface S. 

b) Show that on any line that intersects the surface S orthogonally at the point 
q € S, there are at most k points p such that the function dp(x) has q as a degenerate 
critical point (that is, a point at which the Hessian of the function vanishes). 


c) Show that in the case of a curve S (k = 1) in the plane R? (n = 2) the point 
p for which the point q € S is a degenerate critical point of dp(x) is the center of 
curvature of the curve S at the point q E€ S. 


6. In the plane R? with Cartesian coordinates x, y construct the level curves of the 
function f(x,y) = xy and the curve 


S ={(a,y) E R*|2? +y? =1}. 


Using the resulting picture, carry out a complete investigation of the problem 
of extrema of the function f| . 
S 
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7. The following functions of class C) (R?; R) are defined on the plane R? with 
Cartesian coordinates x, y: 


x? —y+te7V/™ sin} , iar 0, 


f(z.y)=e°-y; F(x,y) = 
ay, ifx#=0. 


a) Draw the level curves of the function f(x,y) and the curve S defined by the 
relation F(x,y) = 0. 


b) Investigate the function fl. for extrema. 


c) Show that the condition that the form ô;; f(£o)E EI be positive-definite or 
negative-definite on T'S;,, in contrast to the condition for the form 0,; L(xo)é*é’ on 
TSz, given in Theorem 2, is still not sufficient for the possible extremum xo € S 


to be an actual extremum of the function f | 


d) Check to see whether the point xo = (0,0) is critical for the function f and 
whether one can study the behavior of f in a neighborhood of this point using only 
the second (quadratic) term of Taylor’s formula, as was assumed in c). 


8. In determining principal curvatures and principal directions in differential ge- 
ometry it is useful to know how to find an extremum of a quadratic form hiju'u? 
under the hypothesis that another (positive-definite) form g;;u*u? is constant. Solve 
this problem by analogy with Example 9 which was discussed above. 


9. Let A = [ai] be a square matrix of order n such that 
(aj) =H; = GH]... 0), 
i=1 


where Hi, ..., Hn is a fixed set of n nonnegative real numbers. 


a) Show that det? A can have an extremum under these conditions only if the 
rows of the matrix A are pairwise orthogonal vectors in R”. 


b) Starting from the equality 
det? A = det A- det A* , 
where A” is the transpose of A, show that under the conditions above 


max det” A = Hı ++- Hn . 


c) Prove Hadamard’s inequality for any matrix [aż]: 
. ry = . 
det?’(a}) < [] (Za) ; 
j=l i=1 
d) Give an intuitive-geometric interpretation of Hadamard’s inequality; 


10. a) Draw the level surfaces of the function f and the plane S in Example 10. 
Explain the result obtained in this example on the figure. 

b) Draw the level curves of the function f and the line S in Example 11. Explain 
the result obtained in this example on the figure. 
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11. In Example 6 of Sect. 5.4, starting from Fermat’s principle we obtained Snell’s 
law for refraction of light at the interface of two media when the interface is a plane. 
Does this law remain valid for an arbitrary smooth interface? 


12. a) A point mass in a potential force field can be in an equilibrium position 
(also called a state of rest or a stationary state) only at critical (stationary) points 
of the potential. In this situation a position of stable equilibrium corresponds to a 
strict local minimum of the potential and an unstable equilibrium point to a local 
maximum. Verify this. 


b) To which constrained extremal problem (solved by Lagrange) does the prob- 
lem of the equilibrium position reduce for a point mass in a potential force field 
(for example, gravitational) with ideal constraints (for example, a point may be 
confined to a smooth surface or a bead may be confined to a smooth thread or a 
ball to a track)? The constraint is ideal (there is no friction); this means that its 
effect on the point (the reactive force of the constraint) is always normal to the 
constraint. 


c) What physical (mechanical) meaning do the expansion (8.166), the necessary 
criterion for a constrained extremum, and the Lagrange multiplier have in this case. 
Note that each of the functions of the system (8.160) can be divided by the 
absolute value of its gradient, which obviously leads to an equivalent system (if its 
rank is equal to m everywhere). Hence all the vectors grad F’ (xo) on the right-hand 
side of (8.166) can be regarded as unit normal vectors to the corresponding surface. 


d) Do you agree that the Lagrange method of finding a constrained extremum 
becomes obvious and natural after the physical interpretation just given? 


Some Problems 
from the Midterm Examinations 


1. Introduction to Analysis 
(Numbers, Functions, Limits) 


Problem 1. The length of a hoop girdling the Earth at the equator is in- 
creased by 1 meter, leaving a gap between the Earth and the hoop. Could 
an ant crawl through this gap? How big would the absolute and relative in- 
creases in the radius of the Earth be if the equator were lengthened by this 
amount? (The radius of the Earth is approximately 6400 km.) 


Problem 2. How are the completeness (continuity) of the real numbers, the 
unboundedness of the series of natural numbers, and Archimedes’ principle 
related? Why is it possible to approximate every real number arbitrarily 
closely by rational numbers? Explain using the model of rational fractions 
(rational functions) that Archimedes’ principle may fail, and that in such 
number systems the sequence of natural numbers is bounded and there exist 
infinitely small numbers. 


Problem 3. Four bugs sitting at the corners of the unit square begin to 
chase one another with unit speed, each maintaining a course in the direction 
of the one pursued. Describe the trajectories of their motions. What is the 
length of each trajectory? What is the law of motion (in Cartesian or polar 
coordinates)? 


Problem 4. Draw a flow chart for computing ya (a > 0) by the recursive 
procedure 
ines = 5 (a0 + 2) 
n+1— 9 n Ln : 


How is equation solving related to finding fixed points? How do you find ya? 


Problem 5. Let g(x) = f(x) + o(f(z)) as x — ov. Is it also true that 
f(x) = g(x) + o(g(x)) as x > œ? 


Problem 6. By the method of undetermined coefficients (or otherwise) find 
the first few (or all) coefficients of the power series for (1 + 2)® with a = 
—1, —4, 0, 5 1, 3. (By interpolating the coefficients of like powers of x in such 
expansions, Newton wrote out the law for forming the coefficients with any 
a € R. This result is known as Newton’s binomial theorem.) 
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Problem 7. Knowing the power-series expansion of the function e”, find by 
the method of undetermined coefficients (or otherwise) the first few (or all) 
terms of the power-series expansion of the function In(1 + z). 


Problem 8. Compute exp A when A is one of the matrices 


/0 0 01 0 1 0 1 0 0 
0 0)” 0 0)” 001], 0 2 0 
0 0 0 0 0 8 
Problem 9. How many terms of the series for e?” must one take in order 


to obtain a polynomial that makes it possible to compute e7 on the interval 
[—3,5] within 1072? 


Problem 10. Sketch the graphs of the following functions: 


3 


(1-s)(1 +r) 


a) lgcoszsingz; b) arctan 


2. One-variable Differential Calculus 


Problem 1. Show that if the acceleration vector a(t) is orthogonal to the 
vector v(t) at each instant of time t, the magnitude |v(t)| remains constant. 


Problem 2. Let (x,t) and (Z,t) be respectively the coordinate of a moving 
point and the time in two systems of measurement. Assuming the formulas 
č = ax + Gt and t = yx + ôt for transition from one system to the other 
are known, find the formula for the transformation of velocities, that is, the 


connection between v = g and v = S. 


Problem 3. The function f(z) = x?sin 4 for x # 0, f(0) = 0 is differen- 
tiable on R, but f’ is discontinuous at x = 0 (verify this). We shall “prove,” 


however, that if f : R —> R is differentiable on R, then f’ is continuous at 
every point a € R. By Lagrange’s theorem 


f(x) — f(a) 


ZXT—a 


= f'(§), 
where € is a point between a and x. Then if x —> a, it follows that € > a. By 


definition, 
im [=F 


ra £ — 


= f'(a) , 


and since this limit exists, the E side of Lagrange’s formula has a 
limit equal to it. That is, f’(€) > f'(a) as € — a. The continuity of f’ at a 
is now “proved.” Where is the error? 
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Problem 4. Suppose the function f has n + 1 derivatives at the point zo, 
and let € = £o + 0,(x — Zo) be the intermediate point in Lagrange’s formula 


for the remainder term +, f(™(€)(x — zo)”, so that 0 < 6, < 1. Show that 
bz > aH as x > xo if f+) (zo) Æ 0. 


Problem 5. Prove the inequality 
aj’ TE i < Q101 F- + Qnan f 
where @1,...,@n,Q1,...,Q@n are nonnegative and œi +---+a, =l. 


Problem 6. Show that 


n 
lim (1+2) = e" (cosy +isiny) (z=ax+iy), 


n— Oo 


so that it is natural to suppose that e” = cos y +i sin y (Euler’s formula) and 
e? = efe" = e” (cosy + isiny) . 


Problem 7. Find the shape of the surface of a liquid rotating at uniform 
angular velocity in a glass. 


Problem 8. Show that the tangent to the ellipse A + y = ] at the point 
£o, yo) has the equation 7 + 44 = 1, and that light rays from a source 
a b 


situated at one of the foci F} = ( — va? — b?,0), Fz = (va? — b?,0) of an 
ellipse with semiaxes a > b > 0 are reflected by an elliptical mirror to the 
other focus. 


Problem 9. A particle subject to gravity, without any initial boost, begins 
to slide from the top of an iceberg of elliptic cross-section. The equation of 
the cross section is 27+ 5y? = 1, y > 0. Compute the trajectory of the motion 
of the particle until it reaches the ground. 


3. Integration and Introduction to Several Variables 


Problem 1. Knowing the inequalities of Hélder, Minkowski, and Jensen for 
sums, obtaining the corresponding inequalities for integrals. 


1 
Problem 2. Compute the integral f e77" dr with a relative error of less than 
10%. j 


~ yn 
ror integral, has limit 1 as x —> +00. Draw the graph of this function and 
find its derivative. Show that as x —> +00 


E E E E Sapu 
r =1-— —-~.+354-- = —}). 
JT 2G. -22g8- Bg = 2agt at 


Problem 3. The function erf (x) = 4 J et dt, called the probability er- 
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How can this asymptotic formula be extended to a series? Are there any 
values of x € R for which this series converges? 


Problem 4. Does the length of a path depend on the law of motion (the 
parametrization)? 


Problem 5. You are holding one end of a rubber band of length 1 km. A 
beetle is crawling toward you from the other end, which is clamped, at a rate 
of 1 cm/s. Each time it crawls 1 cm you lengthen the band by 1 km. Will 
the beetle ever reach your hand? If so, approximately how much time will it 
require? (A problem of L.B. Okun’, proposed to A. D. Sakharov.) 


Problem 6. Calculate the work done in moving a mass in the gravitational 
field of the Earth and show that this work depends only on the elevation of 
the initial and terminal positions. Find the work done in escaping from the 
Earth’s gravitational field and the corresponding escape velocity. 


Problem 7. Using the example of a pendulum and a double pendulum ex- 
plain how it is possible to introduce local coordinate systems and neighbor- 
hoods into the set of corresponding configurations and how a natural topology 
thereby arises making it into the configuration space of a mechanical system. 
Is this space metrizable under these conditions? 


Problem 8. Is the unit sphere in R” compact? In Cfa, b]? 


Problem 9. A subset of a given set is called an e-grid if any point of the 
set lies at a distance less than £ from some point of the set. Denote by N (e) 
the smallest possible number of points in an e€-grid for a given set. Estimate 
the -entropy log, N(é) of a closed line segment, a square, a cube, and a 
bounded region in R”. Does the quantity oe ws as € — 0 give a picture 
of the dimension of the space under consideration? Can such a dimension be 


equal, for example, to 0.5? 


Problem 10. On the surface of the unit sphere S in RÌ the temperature 
T varies continuously as a function of a point. Must there be points on the 
sphere where the temperature reaches a minimum or a maximum? If there 
are points where the temperature assumes two given values, must there be 
points where it assumes intermediate values? How much of this is valid when 
the unit sphere is taken in the space C [a,b] and the temperature at the point 


f € S is given as 
7 zj 
rf) =(flfleax) ? 


Problem 11. a) Taking 1.5 as an initial approximation to V2, carry out 
two iterations using Newton’s method and observe how many decimal places 
of accuracy you obtain at each step. 
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b) By a recursive procedure find a function f satisfying the equation 


f(z) =a+4+ | f(t) dt. 
| 


4. Differential Calculus of Several Variables 


Problem 1. a) What is the relative error 6 = aria in computing the value 
of a function f(x,y,z) at a point (x,y,z) whose coordinates have absolute 
errors Ax, Ay, and Az respectively? 

b) What is the relative error in computing the volume of a room whose 
dimensions are as follows: length x = 5+0.05 m, width y = 4+0.04 m, height 
z = 3 + 0.03 m? 

c) Is it true that the relative error of the value of a linear function coincides 
with the relative error of the value of its argument? 

d) Is it true that the differential of a linear function coincides with the 
function itself? 


e) Is it true that the relation f’ = f holds for a linear function f? 


Problem 2. a) One of the partial derivatives of a function of two variables 
defined in a disk equals zero at every point. Does that mean that the function 
is independent of the corresponding variable in that disk? 

b) Does the answer change if the disk is replaced by an arbitrary convex 
region? 

c) Does the answer change if the disk is replaced by an arbitrary region? 


d) Let x = x(t) be the law of motion of a point in the plane (or in R”) in 
the time interval t € [a,b]. Let v(t) be its velocity as a function of time and 
C = conv {v(t) |t € [a, b]} the smallest convex set containing all the vectors 
v(t) (usually called the convex hull of a set that spans it). Show that there 
is a vector v in C such that x(b) — x(a) = v- (b—a). 


Problem 3. a) Let F (x,y,z) = 0. Is it true that aE gu oe = —1? Verify 
this for the relation =4 — 1 = 0 (corresponding to the Clapeyron equation of 
state of an ideal gas: 4¥ = R). 

b) Now let F(x,y) = 0. Is it true that au Se = 17 

c) What can you say in general about the relation F(z1,..., £n) = 0? 

d) How can you find the first few terms of the Taylor expansion of the 
implicit function y = f(x) defined by an equation F(z, y) = 0 in a neighbor- 
hood of a point (x0, yo), knowing the first few terms of the Taylor expansion 


of the function F (x,y) in a neighborhood of (xo, yo), where F (zo, yo) = 0 
and F, (z0, yo) is invertible? 
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Problem 4. a) Verify that the plane tangent to the ellipsoid x + ye + a = 1 
at the point (Zo, yo, 20) can be defined by the equation =2 + 4 + 42 =1. 


b) The point P(t) = (5 TR 2) -t emerged from the ellipsoid a + i + 


A = 1 at time t = 1. Let p(t) be the point of the same ellipsoid closest to 
P(t) at time t. Find the limiting position of p(t) as t > +00. 


Problem 5. a) In the plane R? with Cartesian coordinates (x,y) construct 
the level curves of the function f(x,y) = xy and the curve S = {(z,y) € 
R? | xr? + y* = 1}. Using the resulting picture, carry out a complete study of 
the extremal problem for f | g the restriction of f to the circle S. 


b) What is the physical meaning of the Lagrange multiplier in Lagrange’s 
method of finding extrema with constraints when an equilibrium position is 
sought for a point mass in a gravitational field if the motion of the point is 
constrained by ideal relations (for example, relations of the form F (£, y, z) = 
0, Fo(z,y, z) = 0)? 


Examination Topics 


1. First Semester 


1.1. Introduction to Analysis | 
and One-variable Differential Calculus 


1. Real numbers. Bounded (from above or below) numerical sets. The axiom 
of completeness and the existence of a least upper (greatest lower) bound of 
a set. Unboundedness of the set of natural numbers. 


2. Fundamental lemmas connected with the completeness of the set of real 
numbers R (nested interval lemma, finite covering, limit point). 


3. Limit of a sequence and the Cauchy criterion for its existence. Tests for 
the existence of a limit of a monotonic sequence. 


4. Infinite series and the sum of an infinite series. Geometric progressions. 
The Cauchy criterion and a necessary condition for the convergence of a 
series. The harmonic series. Absolute convergence. 


5. A test for convergence of a series of nonnegative terms. The comparison 


theorem. The series ¢(s) = $ n7°. 
n=1 


6. The idea of a logarithm and the number e. The function exp(z) and the 
‘power series that represents it. 


7. The limit of a function. The most important filter bases. Definition of 
the limit of a function over an arbitrary base and its decoding in specific 
cases. Infinitesimal functions and their properties. Comparison of the ultimate 
behavior of functions, asymptotic formulas, and the basic operations with the 
symbols o(-) and O(-). 

8. The connection of passage to the limit with the algebraic operations and 
the order relation in R. The limit of sing as x — 0. 


9. The limit of a composite function and a monotonic function. The limit of 
(1+ +)" as z > ov. 
10. The Cauchy criterion for the existence of the limit of a function. 
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11. Continuity of a function at a point. Local properties of continuous func- 
tions (local boundedness, conservation of sign, arithmetic operations, conti- 
nuity of a composite function). Continuity of polynomials, rational functions, 
and trigonometric functions. 

12. Global properties of continuous functions (intermediate-value theorem, 
maxima, uniform continuity). 

13. Discontinuities of monotonic functions. The inverse function theorem. 
Continuity of the inverse trigonometric functions. 

14. The law of motion, displacement over a small interval of time, the instan- 
taneous velocity vector, trajectories and their tangents. Definition of differ- 
entiability of a function at a point. The differential, its domain of definition 
and range of values. Uniqueness of the differential. The derivative of a real- 
valued function of a real variable and its geometric meaning. Differentiability 
of sin x, cos x, e”, ln |x|, and z®. 

15. Differentiability and the arithmetic operations. Differentiation of poly- 
nomials, rational functions, the tangent, and the cotangent. 

16. The differential of a composite function and an inverse function. Deriva- 
tives of the inverse trigonometric functions. 

17. Local extrema of a function. A necessary condition for an interior ex- 
tremum of a differentiable function (Fermat’s lemma). 


18. Rolle’s theorem. The finite-increment theorems of Lagrange and Cauchy 
(mean-value theorems). 


19. Taylor’s formula with the Cauchy and Lagrange forms of the remainder. 


20. Taylor series. The Taylor expansions of e”, cosx, sing, In(1 + x), and 
(1 + x)” (Newton’s binomial formula). 


21. The local Taylor formula (Peano form of the remainder). 


22. The connection between the type of monotonicity of a differentiable func- 
tion and the sign of its derivative. Sufficient conditions for the presence or 
absence of a local extremum in terms of the first, second, and higher-order 
derivatives. 


23. Convex functions. Differential conditions for convexity. Location of the 
graph of a convex function relative to its tangent. 


24. The general Jensen inequality for a convex function. Convexity (or con- 
cavity) of the logarithm. The classical inequalities of Cauchy, Young, Hölder, 
and Minkowski. 


25. Complex numbers in algebraic and trigonometric notation. Convergence 
of a sequence of complex numbers and a series with complex terms. The 
Cauchy criterion. Absolute convergence and sufficient conditions for absolute 
convergence of a series with complex terms. The limit Jim (1 + zy”, 


26. The disk of convergence and the radius of convergence of a power series. 
The definition of the functions e7, cos z, sin z (z € C). Euler’s formula and 
the connections among the elementary functions. 
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27. Differential equations as a mathematical model of reality, examples. The 
method of undetermined coefficients and Euler’s (polygonal) method. 


28. Primitives and the basic methods of finding them (termwise integration of 
sums, integration by parts, change of variable). Primitives of the elementary 
functions. 


2. Second Semester 


2.1. Integration. Multivariable Differential Calculus 


1. The Riemann integral on a closed interval. A necessary condition for 
integrability. Sets of measure zero, their general properties, examples. The 
Lebesgue criterion for Riemann integrability of a function (statement only). 
The space of integrable functions and admissible operations on integrable 
functions. 


2. Linearity, additivity and general evaluation of an integral. 


3. Evaluating the integral of a real-valued function. The (first) mean-value 
theorem. 


4. Integrals with a variable upper limit of integration, their properties. Exis- 
tence of a primitive for a continuous function. The generalized primitive and 
its general form. 


5. The Newton—Leibniz formula. Change of variable in an integral. 


6. Integration by parts in a definite integral. Taylor’s formula with integral 
remainder. The second mean-value theorem. 


7. Additive (oriented) interval functions and integration. The general pattern 
in which integrals arise in applications, examples: length of a path (and its 
independence of parametrization), area of a curvilinear trapezoid (area under 
a curve), volume of a solid of revolution, work, energy. 


8. The Riemann-Stieltjes integral. Conditions under which it can be reduced 
to the Riemann integral. Singularities and the Dirac delta-function. The con- 
cept of a generalized function. 


9. The concept of an improper integral. Canonical integrals. The Cauchy 
criterion and the comparison theorem for studying the convergence of an 
improper integral. The integral test for convergence of a series. 


10. Local linearization, examples: instantaneous velocity and displacement; 
simplification of the equation of motion when the oscillations of a pendulum 
are small; computation of linear corrections to the values of exp(A), A7?, 
det(), (a,b) under small changes in the arguments (here A is an invertible 
matrix, E is the identity matrix, a and b are vectors, and (-,-) is the inner 
product). 
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11. The norm (length, absolute value, modulus) of a vector in a vector space; 
the most important examples. The space L( X,Y) of continuous linear trans- 
formations and the norm in it. Continuity of a linear transformation and 
finiteness of its norm. 

12. Differentiability of a function at a point. The differential, its domain of 
definition and range of values. Coordinate expression of the differential of a 
mapping f : R™ — R”. The relation between differentiability, continuity, and 
the existence of partial derivatives. 

13. Differentiation of a composite function and the inverse function. Coordi- 
nate expression of the resulting laws in application to different cases of the 
mapping f : R” > R”. 

14. Derivative along a vector and the gradient. Geometric and physical ex- 
amples of the use of the gradient (level surfaces of functions, steepest descent, 
the tangent plane, the potential of a field, Euler’s equation for the dynamics 
of an ideal fluid, Bernoulli’s law, the work of a wing). 


15. Homogeneous functions and the Euler relation. The dimension method. 


16. The finite-increment theorem. Its geometric and physical meaning. Ex- 
amples of applications (a sufficient condition for differentiability in terms of 
the partial derivatives; conditions for a function to be constant in a domain). 


17. Higher-order derivatives and their symmetry. 
18. Taylor’s formula. 


19. Extrema of functions (necessary and sufficient conditions for an interior 
extremum). 


20. Contraction mappings. The Picard—Banach fixed-point principle. 
21. The implicit function theorem. 


22. The inverse function theorem. Curvilinear coordinates and rectification. 
Smooth k-dimensional surfaces in R” and their tangent planes. Methods of 
defining a surface and the corresponding equations of the tangent space. 


23. The rank theorem and functional dependence. 


24. Extrema with constraint (necessary condition). Geometric, algebraic, and 
physical interpretation of the method of Lagrange multipliers. 


25. A sufficient condition for a constrained extremum. 


26. Metric spaces, examples. Open and closed subsets. Neighborhoods of a 
point. The induced metric, subspaces. Topological spaces. Neighborhoods of 
a point, separation properties (the Hausdorff axiom). The induced topology 
on subsets. Closure of a set and description of relatively closed subsets. 


27. Compact sets, their topological invariance. Closedness of a compact set 
and compactness of a closed subset of a compact set. Nested compact sets. 
Compact metric spaces, ¢-grids. Criteria for a metric space to be compact 
and its specific form in R”. 
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28. Complete metric spaces. Completeness of R, C, R”, C”, and the space 
Cla, b] of continuous functions under uniform convergence. 


29. Criteria for continuity of a mapping between topological spaces. Preser- 
vation of compactness and connectedness under a continuous mapping. The 
classical theorems on boundedness, the maximum-value theorem, and the 
intermediate-value theorem for continuous functions. Uniform continuity on 
a compact metric space. | 
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— for a constrained extremum 

—— necessary, 530 

~— sufficient, 534 

— for a monotonic function, 218, 236 

— for an extremum, 237-239 

~— of a function of several variables, 

463 

— for convexity, 243-245 

— for differentiability, 439, 457 

— for integrability 

~— necessary, 333 | 

~— necessary and sufficient, 339, 343 

~— sufficient, 334-338 

— necessary, 2 

— sufficient, 2 

Constant 

— Euler’s, 147 

— gravitational, 59 

~— Planck’s, 59 


Continuation 

— of a function, 12 
Continuum, 76 

Convergence 

— absolute, 269 

— necessary condition, 96 

— of a sequence, 80 

— of a series, 95 

—— absolute, 97, 99 

— of an improper integral, 394 
—— absolute, 399 

—— conditional, 403 
Convergence test 

— Abel-—Dirichlet, 404 

— Cauchy’s, 99 

— integral, 400 

— Weierstrass’, 87, 99 
Coordinate of a point, 54, 412 
Coordinates 

— Cartesian, 433 

— curvilinear, 470, 471 

—— in R”, 515 

— polar, 500, 501 

— spherical, 501 

Cosine 

— circular, 275 

— hyperbolic, 201, 275 

Cosine integral, 327 
Cotangent 

— hyperbolic, 203 

Criterion 

— Cauchy 

—— for a function, 132, 420 
—— for an improper integral, 399 
—— for sequences, 85, 269, 419 
—— for series, 95, 269 

— for a constant function, 218, 236 
— for a constrained extremum 
—— necessary, 530 


‘ —— sufficient, 534 


— for a monotonic function, 137, 
218, 236 

for an extremum, 237 

—— necessary, 237, 463 

—— sufficient, 239, 465 

for continuity of a monotonic 
function, 166 

— for integrability 


—— du Bois-Reymond, 346 

~~ Lebesgue, 342, 346 

— for monotonic sequences, 87 

— for series of nonnegative terms, 98 
— integrability 

—-— Darboux, 345 

— necessary, 2 

— Sylvester’s, 467 

Critical point, 464 

— index, 516 

— nondegenerate, 512, 516 
Curvature, 264 

Curve 

— level, 450, 482 

— parametrized, 377 

— simple closed, 377 

— unicursal, 325 

Curvilinear coordinates, 470, 471 
Cycloid, 392, 410 


D 

Decay, radioactive, 293-295 

Dependence, functional, 508, 516 

Derivative, 179-181, 279, 435 

— directional, 447 

— higher-order, 209 

— logarithmic, 199 

— of a function of a complex variable, 
279 

— one-sided, 262 

— partial, 436 

—— higher-order, 458 

— with respect to a vector, 446 

Diameter of a set, 417, 419 

Diffeomorphism, 498, 509 

— elementary, 509 

Difference 

— finite, 235 

— of sets, 8 

Differential equation, 289-303, 328 

— of harmonic oscillations, 300—303 

— with variables separable, 328 

Differential of a function, 178-185 

— of several variables, 435—438 

Differential of a mapping, 435, 438 

Differentiation 

— and arithmetic operations, 
193-196, 279, 440 
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— of a composite function, 196-199, 
443 

— of a power series, 280 

— of an implicit function, 204-208 

— of an inverse function, 199-204, 
448 

Dimension 

— of a physical quantity, 453-455 

— of a surface, 517-518 

Directional derivative, 447 

Discontinuity, removable, 156 

Disk of convergence, 270 

Distance 

— between sets, 417 

— in R”, 412 

— on the real line, 56 

Divisor, 49 

— greatest common, 66 

Domain 

— in R”, 425 

— of a function, 12 

— of a relation, 20 


E 

Efficiency, 303 

Element 

— inverse, 37 

— maximal, 44 

— minimal, 44 

— negative, 36 

— neutral, 36, 37 

— of a set, 7, 8 

— zero, 36, 39 

Energy 

— kinetic, 15, 305, 386 

— potential, 15, 305, 386, 388 
— total, 15, 306, 387 
Equality 

— of functions, 12 

— of sets, 7 

Equation 

— differential, 178, 289-303, 328 
— Euler’s (hydrodynamic), 447, 451 
— heat, 475 7 
— Laplace’s, 475 

Equations 

— Cauchy—Riemann, 515 

— Euler-Lagrange, 496 

— Hamilton, 496 
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Error 

— absolute, 59-60, 80, 198 

— relative, 59-60, 198, 451 

Error function, 405 

Escape velocity, 391 

Euclidean structure, 433 

Euler’s identity for homogeneous 
functions, 452 

Euler’s substitution, 322 

Expansion 

— partial fraction, 285, 317 

— Taylor, 211, 222-232, 281 

Explosion, 295 

Exponent, 119 

Exponential function, 190, 296-302 

— complex, 275, 301 

Exponential integral, 327, 409 

Extension of a function, 12 

Extremum 

— constrained, 517, 527—539 

— interior, 215, 237 

— of a function of several variables, 
463—466 


F 

Factorization of a polynomial, 284 

Falling bodies, 295 

Fiber, 22 

Fibonacci numbers, 105 

Field 

— algebraic, 37 

— Archimedean, 68 

— ordered, 68 

— potential, 447, 450, 544 

— vector, 447 

First mean-value theorem, 352 

Force function, 450 

Formula 

— barometric, 291-293, 304 

— Bonnet’s, 357 

— Cauchy—Hadamard, 270 

— change of variable 

—— in a definite integral, 364 

— de Moivre’s, 267, 276 

— Euler’s, 274 

— for change of variable in an 
indefinite integral, 312 

— for integration by parts, 312 

—— in an improper integral, 398 


— Hermite interpolation, 234 

— integration by parts, 371 

— Lagrange interpolation, 234 

— Leibniz’, 210 

— MacLaurin’s, 221 

— Meshcherskii’s, 291 

— Newton-Leibniz, 329, 364, 367, 
369 

— Ostrogradskii’s, 324 

— quadrature, 374 

—— rectangular, 373 

—— Simpson’s (parabolic), 374 

—— trapezoidal, 373 

— Taylor’s, 219-232, 461—476 

—— for functions of several variables, 

461-476, 513 

—— local, 227, 463 

—— mult-index notation for, 476 © 

—— with integral form of the 

remainder, 461, 476, 513 

— Tsiolkovskii’s, 291 

— Viete’s, 149 

Fraction 

— continued, 104 

—— convergents, 104, 105 

— partial, 285, 317 

Fractional part, 54 

Function, 11-15, 418 

— additive interval, 347, 386 

— analytic at a point, 225, 281 

— asymptotically equivalent to 
another, 141 

— bounded, 112, 418, 426 

—— from above, 112 

—— from below, 112 

— characteristic, 14 

— concave, 243 

— constant, 109 

— continuous, 423-426 

—— at a point, 151-154 

—— on a set, 154, 424 

— convex, 243—245 

—— downward, 243 

—— upward, 243 

— decreasing, 137 

— differentiable at a point, 178-179 

— Dirichlet, 158, 344 


— exponential, 118-123, 190, 275, 
276, 296-302 

— force, 450 

— harmonic, 475 

— homogeneous, 452 

— hyperbolic, 201 

— implicit, 208, 212, 480—490 

— increasing, 136 

— infinite, 139 

—— of higher order, 139 

— infinitesimal, 113-115 

—— compared with another, 138 

—— of higher order, 139 

— integrable, 333 

— inverse, 16, 165-168, 199, 448, 
498 

— Lagrange, 529, 531, 536 

— locally homogeneous, 452 

— logarithmic, 123-127 

— monotonic, 136 

— nondecreasing, 136 

— nonincreasing, 137 

— of a complex variable, 276 

—— continuous, 276 

—— differentiable, 279 

— of several variables, 411 

—— differentiable, 434 

— periodic, 192, 267, 276, 368, 372 

— power, 127 

— Riemann, 158, 169, 344 

— sgn, 109 

— strictly convex, 243 

— trigonometric, 379-381 

— ultimately bounded, 112, 114, 130, 
137, 418 

— ultimately constant, 112 
— uniformly continuous, 162, 426 

Functional, 12, 14, 347 

Functional dependence, 508, 516 

Fundamental theorem of algebra, 
283-284 


G 

Geodesic, 14 

Geometric series, 96 

Germ of a function, 172 

Gradient, 446—447 

Graph of a function, 19, 170, 243-264 
— of several variables, 470, 471 
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Group, 36 

Abelian, 36, 49 

— additive, 36 

— commutative, 36 
— multiplicative, 37 


H 

Half-life, 293, 304 

Hessian, 495, 512, 542 
Hyperbolic cosine, 201 
Hyperbolic cosine integral, 327 
Hyperbolic sine, 201 
Hyperbolic sine integral, 327 


I 

Ideal of a ring, 172 

— maximal, 172 

— of functions, 172 

Identity 

— in a multiplicative group, 37 

— in the real numbers, 36, 37 

Image, 15, 21 

Imaginary part of a complex number, 
266 

Imaginary unit, 265 

Imbedding, 15 

Increment 

— of a function, 179, 435 

— of an argument, 178, 179, 435 

Indefinite integral, 307—315 

Index of a critical point, 516 

Inequality 

— Bernoulli’s, 66, 89, 240 

— Cauchy—Bunyakovskii, 358 

— Hadamard’s, 543 

— Holder’s, 241, 250, 358 

— Jensen’s, 249 

— Minkowski’s, 242, 359, 412 

— Schwarz, 358 

— triangle, 57, 242, 412, 413, 432 

— Young’s, 241, 263, 393 

Inferior limit, 91 

Injection, 15 

Inner product, 433 . 

Integer part, 54 

Integral 

— cosine, 327 

— Darboux, 345 

— definite, 329-333 
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elliptic, 323-324, 326-327, 383, 

402 

—— complete, 383, 389, 409 

—— first kind, 324, 389, 408 

—— second kind, 324, 383 

—— third kind, 324 

Euler, 402 

— Euler—Poisson, 327, 405 

— exponential, 409 

Fresnel, 327, 369, 408 

— Gaussian, 405 

— hyperbolic cosine, 327 

— hyperbolic sine, 327 

— hyperelliptic, 323 

— improper, 393-397 

_-- absolutely convergent, 399 

—— conditionally convergent, 403 

—— convergent, 394 

—— divergent, 394 

—— with more than one singularity, 
405 

indefinite, 307-315 

— logarithmic, 314, 327 

— of a vector-valued function, 338 

— Riemann, 332-333 

— sine, 327 

— with variable upper limit, 359 

Integration, 308, 329 

— by parts, 312, 362 

—— in a definite integral, 371 

— by substitution (change of 
variable), 312, 364 

Intersection of sets, 8, 11 

Interval, 56 

— closed, 56 

— half-open, 56 

— multidimensional, 416 

— numerical, 55 

— open, 56 

— unbounded, 56 

Isomorphism, 39, 147 

Iteration, 31, 170 


J 

Jacobi matrix, 438, 441, 464 

Jacobian, 438 

— of the transition to polar 
coordinates, 500 


L 

Laplacian, 475, 480 

Law 

— Bernoulli’s, 451 

— Clapeyron’s (ideal gas), 292, 498 

— Kepler’s, 455 

— Newton’s, 173, 213, 296, 447, 
450 

— of addition of velocities, 205-208 

— of refraction, 240, 544 

— Ohm’s, 24 

— Snell’s, 239, 544 

Least upper bound, 44 

Legendre transform, 494—495 

Lemma 

— Bolzano—Weierstrass, 72, 90, 94 

— Fermat’s, 215 

— finite covering, 71 

— Hadamard’s, 476, 512 

— least upper bound, 53 

— Morse’s, 512 

— on limit points, 72 

— on nested compact sets, 417 

Length 

— of a curve, 14, 377-378 

— of a path, 377-378 

— of an ellipse, 383 

— of an interval, 52 

Level curve, 450, 482 

Level of a function, 528, 542 

Level set, 450 

Level surface, 488 

Lifetime, 305 

Limit 

— of a composite function, 133, 420 

— of a function, 107—137, 418 

— of a mapping, 418 

— of a sequence, 79-82, 85 

—— inferior, 91 

—— partial, 93 

—— superior, 91 

— over a base, 127-131 

Limits of integration, 332, 348, 359 

Linearity of the integral, 347 

Logarithm, 123, 124, 198, 199, 288 

— natural, 124, 288 

Logarithmic integral, 314, 327, 407 

Logarithmic scale, 199 


M 

Mantissa, 69 

Mapping, 12, 418 

— bijective, 16, 23 

— bounded, 112, 418, 426 

— constant, 109 

— continuous, 151-154, 423—426 

— identity, 18 

— injective, 15, 23 

— inverse, 16, 18, 199, 448, 498, 499 

—— left, 24 

—— right, 24 

— linear, 178, 182, 430-431, 440, 443, 
448, 503 

— one-to-one, 16 

— surjective, 15, 23 

— tangent, 435, 464, 503, 522 

— ultimately bounded, 112, 418 

— uniformly continuous, 162, 426 

Mass, critical, 295 

Maximum, 44, 161, 426, 463 

— constrained, 527, 538 

— local, 214, 238-239, 463-472, 544 

Mean 

— arithmetic, 106, 250, 261 

— geometric, 250, 261 

— harmonic, 96, 106, 261 

integral, 370 

— of order p, 106, 261 

— quadratic, 106, 261 

Mesh of a partition, 331 

Method 

— dimension, 454 

— Euler’s, 298 

— gradient, 447 

— Lagrange multipliers, 529, 544 

— of exhaustion, 330 

of least squares, 477 

— of undetermined coefficients, 286, 
299 

— Ostrogradskii’s, 324 

Metric, 412-413, 426 

— in R”, 412, 418 

Minimum, 44, 161, 426, 463 

— constrained, 527, 538 

— local, 214, 238-239, 463-472, 544 

Modulus 

— of a complex number, 266 
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— of a real number, 56 

— of a vector, 176, 266 

— of continuity, 170 

Modulus of a spring, 300, 306, 386 
Monotonicity of the integral, 351 
Morphism, 12 

Multi-index, 475 

Multiplicity of a root, 285 


N 

Necessary condition for convergence, 
96 

Neighborhood 

— deleted, 108, 420 

— of a point, 57, 72, 107, 413, 418 

Node, interpolation, 234, 372 

Norm of a vector, 176, 431—434 

Normal vector, 472 

Nuclear reactor, 295 

Number 

— algebraic, 51, 52, 67, 76 

— complex, 265 

— e, 89, 102-104, 123-135, 276, 301 

— Fibonacci, 105 

— integer, 49 

— irrational, 51, 52, 67, 76 

— natural, 25, 28, 46 

—— von Neumann, 28, 31 

— negative, 43 

— n, 52, 276, 374, 380, 393 

— positive, 43 

— prime, 49 

— rational, 50, 54, 75 

— real, 35 : 

— transcendental, 51, 67, 76 

Number axis, 54 


O 

Operation 

— addition, 36 

— associative, 17, 36, 37 
— commutative, 36, 37 
— distributive, 37 

— multiplication, 36 

— of differentiation, 193 
— on sets, 8, 11 
Operator, 12 

— Laplacian, 475, 480 
— logical, 7, 29 
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— shift, 14 

— translation, 14 

Orbit, planetary, 306 

Order 

— linear, 38, 55, 68 

— partial, 38, 68 

Order of contact, 183 

Orthogonal vectors, 433 

Oscillation, 300-303 

— damped, 303 

— harmonic, 301, 303 

— of a function 

—— at a point, 154, 424 

—— on a set, 131, 153, 419 

— of a particle in a well, 410 

— of a pendulum, 388-390, 392, 403, 
454 

— on a set, 334 

Oscillator 

— linear, 306 

— plane, 306 

Osculating circle, 264 


P 

Pair 

— ordered, 9, 28 

— unordered, 9, 27 

Parabolic mirror, 188 

Parametrization of a curve, 377, 379 

— natural, 388 

Partial derivative, 436 

Partial fraction, 285, 317 

Partial limit, 93 

Partition of an interval, 331 

— with distinguished points, 331 

Path, 377, 423, 425, 449, 470, 517, 
541 

— closed, 377 

— piecewise smooth, 378 

simple, 377 

— simple closed, 377 

Pendulum, 388-390, 403, 454 

— cycloidal, 392, 410 

Period 

— of a function, 192, 276, 368 

— of oscillation, 389, 392, 403, 410, 
454 

— of revolution, 306 

n(x), 137-142 


Plane, 467, 470 

— complex, 266, 276 

— tangent, 471—474, 488, 522, 523 

—— to a surface, 488, 523 

Point 

— boundary, 414 

— Chebyshev alternant, 172 

— critical, 464, 522, 543 

—— degenerate, 542 

—— nondegenerate, 516 

—— nondgenerate, 512 

—— saddle point, 472 

— exterior, 414 

— fixed, 169, 170 

— in R”, 412 

— interior, 414 

— limit, 72, 415 

— local maximum, 214, 238-239, 
463-472 | 

— local minimum, 214, 238-239, 
463-472 

— of discontinuity 

—— of a monotonic function, 166 

—— of first kind, 157 

—— of second kind, 157 

— of inflection, 247, 248, 494 

— stationary, 464, 544 

Polar coordinates, 500, 501 

Polar form of a complex number, 266 

Polynomial 

— Chebyshev, 171 

— Hermite, 234 

— Lagrange, 234, 372 

— Legendre, 372 

— of best approximation, 171, 172 

— Taylor, 226-230 

Potential 

— Newtonian, 391, 447 

— of a force, 386-388, 447 

— of a vector field, 447, 450, 544 

Power series, 270-273 

— absolutely convergent, 271-273 

— convergent, 270 

Pre-image, 15, 16, 22 

Primitive, 307-315, 329, 365, 367 

— generalized, 361 

— of a rational function, 315-319 

Principal value, 407 


Principle 

— Archimedes’, 52, 62, 74 

— Bolzano—Weierstrass, 72, 74 

— Borel—Lebesgue, 71, 74 

— Cauchy—Cantor, 71, 74 

— Fermat’s, 240, 544 

— least upper bound, 67 

— of induction, 46, 47, 57, 66 

Problem 

— Buffon needle, 393 

— Huygens’, 469, 478 

— Kepler’s (two-body), 173 

— Okun’s, 548 

Procedure 

— recursive, 18 

Product 

— Cartesian, 10, 28, 31 

— direct, 10, 28, 31 

— infinite, 148 

— inner, 433 

— of series, 272 

Projection, 10, 13, 424 

— stereographic, 540 

Property 

— global, 160, 426 

— holding ultimately (over a base), 
130-138 

— local, 158, 172, 424 


Q 
Quantifier 


— existence, 7, 29 
— universal, 7, 29 


R 

Radioactive decay, 293-295 
Radius 

— critical, 295 

_— of convergence, 270 

— of curvature, 264 

Range 

— of a function, 12 

— of a relation, 20 

Rank 

— of a mapping, 503, 516 

— of a number, 64, 67 

— of a system of functions, 516 
Rational part of an integral, 324 
Real part of a complex number, 266 
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Rearrangement of terms of a series, 
97, 272 

Rectification, 502 

Recursion, 18 

Refinement of a partition, 334 

Relation, 5, 19, 20 

— antisymmetric, 21 

— equality, 7, 20 

— equipollence, 25 

— equivalence, 20, 25 

— functional, 21, 22 

— inclusion, 7, 21, 68 

— order, 21, 55 

—— linear, 21 

—— partial, 21 

— reflexive, 20 

— symmetric, 20 

— transitive, 20, 22 

— transpose, 22 

Remainder in Taylor’s formula, 

220-228, 374 

Cauchy form, 221, 364 

— integral form, 363, 461, 476 

Lagrange form, 221-228, 364, 

462, 476 

— Peano form, 228, 463 

Resolution of a diffeomorphism, 
509 

Restriction of a function, 12 

Ring 

— of continuous functions, 172 

— of germs of continuous functions, 
172 

Root 

— multiplicity, 285 

— nth, 68, 119 

— of a complex number, 268 

— of a polynomial, 171, 281-284 

—— multiple, 234, 285 

Rule, l’H6épital’s, 250 


S 

Saddle point, 472 

Secant, 183, 184 

Second mean-value theorem, 353, 
357 

Sequence, 58, 71, 80 

— bounded, 82, 87 

— Cauchy, 85, 269, 419 
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— constant, 81 Sine 
— convergent, 80 — circular, 107, 275 
— decreasing, 87 — hyperbolic, 201, 275 
— divergent, 80 Sine integral, 327 
— fundamental, 85, 269, 419 Space 
— increasing, 87 — configuration, 14 
— monotonic, 87 — Euclidean, 433 
— nested, 71, 72 — metric, 412 
—— of intervals, 72 —— complete, 132, 419 
— nondecreasing, 87 — phase, 15 
— nonincreasing, 87 — Rfa, b], 333, 340, 341, 347 
— numerical, 58 | — R”, 411, 421, 429 
— of closed intervals, 86 | — tangent, 435, 522-526 
— of nested compact sets, 417 — vector, 340, 341, 347, 429 
— ultimately constant, 82 Sphere, 414, 426, 427, 540 
Series, 95 Spherical coordinates, 501 
— absolutely convergent, 97 Stationary point, 464 
— convergent, 95 Streamline, 451 
—— absolutely, 97 Structure 
— divergent, 95 — Euclidean, 433 
— harmonic, 96 — logical, 29 
— numerical, 95 Subsequence, 90 
— power, 222, 270-273 Subset, 7, 28 
— Taylor, 222, 281 — empty, 8, 27 
Set, 5, 6, 25 — proper, 8 
— bounded, 44, 417 Substitution, Euler’s, 322 
—— from above, 44 Successor, 28 
—— from below, 44 Sum 
— Cantor, 346 — Darboux, 338, 345 
— cardinality, 74 — of a series, 95 
— closed, 413—417 —— partial, 95 
— connected, 425, 428 — Riemann, 330, 338 
—— pathwise, 425 —— lower, 338 
— countable, 74, 75 —— upper, 338 
— empty, 8, 26 Superior limit, 91 
— equipollent to another, 25 Support of a path, 377, 381 
— finite, 26 Surface, 470, 471, 501, 517, 521, 
— inductive, 28, 46, 66 523 
— infinite, 26 — level, 488 
— invariant, 24 — minimal, 495 
— level, 450 Surjection, 15 
— of integrable functions, 333, Symbol 
340 — logical, 1 
— of measure zero, 342-344 — O, 141 
— open, 413-416, 425, 428 — o, 139 
— stable, 24 System of computation, 46, 61, 
— unbounded, 420 70 


— uncountable, 76 — positional, 61, 65 


System of functions 
— dependent, 508, 516 
— independent, 508, 516 


T 

Table 

— of derivatives, 205 

— of primitives (indefinite integrals), 
311 

Tangent, 176-185, 215, 246, 480 

— hyperbolic, 203 

Tangent line, 183 

Tangent mapping, 435, 464, 503 

Tangent plane, 471—474, 488, 522, 
523 

— to a surface, 488, 523 

Tangent space, 435, 522-526 

Tangent vector, 472 

Test 

— d’Alembert’s, 100, 224 

— for extrema, 215 

— Gauss’, 149 

Theorem 

— Abel’s, 271 

— Bolzano—Cauchy, 160 

— Cantor uniform-continuity, 164 

— Cantor’s, 26 

— Cantor—Heine, 164 

— Cauchy’s finite-increment, 218 

— Chebyshev’s, 172 

— comparison 

—— for integrals, 400 

—— for series, 98 

— Darboux’, 233, 345 

— Dedekind’s, 65 

— finite-increment, 218, 219, 455 

— fundamental, of arithmetic, 67 

— Heine—Borel, 71 

— implicit function, 480—490 

— Lagrange’s finite-increment, 
216-219 

— Liouville’s, 67 

— mean-value, 217, 455 

—— first, 352, 371 

—— second, 353, 357, 371, 404 

— of dimension theory (J7-theorem), 
454 

— rank, 503 


Subject Index 571 


— Rolle’s, 216, 477 

— Schroder—Bernstein, 31 

— Thales’, 55 

— Vallée Poussin’s, 171 

— Weierstrass maximum-value, 
161 

— Weierstrass’, 87 

Topology, 109 

Transform 

— involutive, 263, 494 

— Legendre, 263, 494—495 

Transformation, 12 

— Abel’s, 353 

— Galilean, 13, 24, 206-207 

— linear, 430—431, 440, 503 

— Lorentz, 13, 24, 208 

Trapezoid, curvilinear, 383 

Truth table, 4 


U 

Union of sets, 8, 11 

Unit 

— multiplicative, 40 
Unit, imaginary, 265 


y 

Value 

— of a function, 11, 21 

—— average, 370 

— principal, 407 

Variable, canonical, 496 

Vector 

— normal, 472 

— tangent, 472 

Vectors, orthogonal, 433 

Velocity 

— escape, 391 

— instantaneous, 174-177, 195 

— of light, 13, 59, 207, 240 

Volume of a solid of revolution, 
384 


W 
Work, 385 
— escape, 391 


Z 
Zero divisor, 69 
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